#3141 Fix new commit notification mails with non-ASCII (#1814)
Merged 6 years ago by pingou. Opened 6 years ago by adamwill.
adamwill/pagure notify-unicode  into  master

file modified
+17 -15
@@ -265,6 +265,7 @@ 

      ''' Send an email with the specified information.

  

      :arg text: the content of the email to send

+     :type text: unicode

      :arg subject: the subject of the email

      :arg to_mail: a string representing a list of recipient separated by a

          comma
@@ -790,37 +791,38 @@ 

      ''' Notify the people following a project's commits that new commits have

      been added.

      '''

+     # string note: abspath, project and branch can only contain ASCII

+     # by policy (pagure and/or gitolite)

      commits_info = []

      for commit in commits:

          commits_info.append({

              'commit': commit,

-             'author': pagure.lib.git.get_author(commit, abspath),

-             'subject': pagure.lib.git.get_commit_subject(commit, abspath)

+             # we want these to be unicodes (read_output gives us str)

+             'author': pagure.lib.git.get_author(commit, abspath).decode('utf-8'),

+             'subject': pagure.lib.git.get_commit_subject(commit, abspath).decode('utf-8')

          })

  

-     commits_string = '\n'.join('{0}    {1}    {2}'.format(

+     # make sure this is unicode

+     commits_string = u'\n'.join(u'{0}    {1}    {2}'.format(

          commit_info['commit'], commit_info['author'], commit_info['subject'])

          for commit_info in commits_info)

      commit_url = _build_url(

          pagure_config['APP_URL'], _fullname_to_url(project.fullname),

          'commits', branch)

  

-     email_body = '''

- The following commits were pushed to the repo "{repo}" on branch

- "{branch}", which you are following:

- {commits}

+     email_body = u'''

+ The following commits were pushed to the repo %s on branch

+ %s, which you are following:

+ %s

  

  

  

  To view more about the commits, visit:

- {commit_url}

- '''

-     email_body = email_body.format(

-         repo=project.fullname,

-         branch=branch,

-         commits=commits_string,

-         commit_url=commit_url

-     )

+ %s

+ ''' % (project.fullname,

+        branch,

+        commits_string,

+        commit_url)

      mail_to = _get_emails_for_commit_notification(project)

  

      send_email(

@@ -117,6 +117,7 @@ 

  

          # Mail text should be as expected.

          self.assertEqual(args[0], exptext)

+         self.assertTrue(isinstance(args[0], unicode))

  

          # Mail subject should be as expected.

          self.assertEqual(args[1], u'Issue #1: issue')
@@ -153,6 +154,7 @@ 

  

          # Mail text should be as expected.

          self.assertEqual(args[0], exptext)

+         self.assertTrue(isinstance(args[0], unicode))

  

          # Mail subject should be as expected.

          self.assertEqual(args[1], u'Issue #1: namespaced project issue')
@@ -184,6 +186,7 @@ 

  

          # Mail text should be as expected.

          self.assertEqual(args[0], exptext)

+         self.assertTrue(isinstance(args[0], unicode))

  

          # Mail subject should be as expected.

          self.assertEqual(args[1], u'Issue #1: forked project issue')
@@ -201,6 +204,43 @@ 

          # Mail should be from user1 (who submitted the issue).

          self.assertEqual(kwargs['user_from'], self.user2.fullname)

  

+     @mock.patch('pagure.lib.notify.send_email')

+     # for non-ASCII testing, we mock these return values

+     @mock.patch('pagure.lib.git.get_author', return_value="Cecil Cõmmîttër")

+     @mock.patch('pagure.lib.git.get_commit_subject', return_value="We love Motörhead")

+     def test_notify_new_commits(self, _, __, fakemail):  # pylint: disable=invalid-name

+         """Test for notification on new commits, especially when

+         non-ASCII text is involved.

+         """

+         exptext = u"""

+ The following commits were pushed to the repo test on branch

+ master, which you are following:

+ abcdefg    Cecil Cõmmîttër    We love Motörhead

+ 

+ 

+ 

+ To view more about the commits, visit:

+ https://pagure.org/test/commits/master

+ """

+         # first arg (abspath) doesn't matter and we can use a commit

+         # ID that doesn't actually exist, as we are mocking

+         # the get_author and get_commit_subject calls anyway

+         pagure.lib.notify.notify_new_commits('/', self.project1, 'master', ['abcdefg'])

+         (_, args, kwargs) = fakemail.mock_calls[0]

+ 

+         # Mail text should be as expected.

+         self.assertEqual(args[0], exptext)

+         self.assertTrue(isinstance(args[0], unicode))

+ 

+         # Mail subject should be as expected.

+         self.assertEqual(args[1], u'New Commits To "test" (master)')

+ 

+         # Mail doesn't actually get sent to anyone by default

+         self.assertEqual(args[2], '')

+ 

+         # Project name should be...project (full) name.

+         self.assertEqual(kwargs['project_name'], self.project1.fullname)

+ 

  # Add more tests to verify that correct mails are sent to correct people here

  

  if __name__ == '__main__':

This is an alternative to #3140. As @puiterwijk figured out,
when a new commit notification mail would contain non-ASCII
text - i.e. when the commit author's full name or the commit
topic contain non-ASCII text - sending the mail would fail. In
3.x at least, this resulted in the service that sends the mails
silently getting stuck; we don't know how the service behaves
when this happens in current git (it was changed from trollius
to celery), but the underlying bug is still there.

It fails because of unicode conversion issues in the new commit
notification function. send_email expects the body text to be
provided as a unicode instance, but notify_new_commits works
with str instances throughout (including the commit author and
commit topic values; these ultimately come in as UTF-8 encoded
strings, from the pagure.git.read_output function). Ultimately
it produces a UTF-8 encoded str which send_email then tries
to encode again, and that crashes.

This approach fixes the problem in a way that's as consistent
as possible with the other functions in the file - the body text
is produced by %s-style string formatting of a unicode, and we
ensure the things that get put into it are also unicode type,
for the ones which can contain non-ASCII text. I also added some
comments about this area, noted the expected type for text in
the send_email docstring, and added a test.

Fixes https://pagure.io/pagure/issue/1814

Signed-off-by: Adam Williamson awilliam@redhat.com

One thing that annoys OCD me here is the test doesn't actually fail in the same way as it fails 'in production', if you take the test and run it on current git master. In production, git master crashes in msg = MIMEText(text.encode('utf-8'), 'plain', 'utf-8'), but if you run the test case on current git master, it crashes when constructing commits_string. I think this means the mocked return values for get_author and get_commit_subject are not quite matching the form those functions actually return, but I can't quite figure out why or how to change it so they do match. The test should still be sufficient, it just annoys me.

Obligatory Python Unicode Question: did you check this is both py2-safe and py3-safe?

=)

No, since you told me that Pagure in general isn't Py3-safe. I suspect the interesting question for Py3 purposes would be what type the output of pagure.lib.git.read_output would be, run under Python 3.

...and the answer to that appears to be, it doesn't work in Python 3. :) Apart from the Python 2-style prints, which are trivial to patch, it fails at out = out.rstrip('\n\r') with TypeError: a bytes-like object is required, not 'str', indicating that in Python 3, the output we get from subprocess - that's out - is a bytes (bytestring), not a str (Python 3, unicode-y string).

Now I write that, I'm fairly sure I've dealt with that same thing before, probably in fedfind or something.

The patch as submitted at this moment works in prod.

Note that there is a PR open to make pagure py3 friendly, so it would be nice to have the new PR not knowingly make this harder :)

This whole area is going to be a complete minefield for Py3 conversion, I suspect. I don't think the PR makes a significant difference in either direction. If anything it would make it very slightly easier, as this PR is at least explicit about what's going on, and includes helpful comments!

edit: if you point me to that PR, I can see if its author addressed the failure of read_output yet and if so how, and see if this PR is compatible with their choice.

Alright, let's rebase and get this in :)

rebased onto a8fef88

6 years ago

Pull-Request has been merged by pingou

6 years ago