#3930 GIT commit e-mail report is not UTF-8 aware
Closed: Fixed None Opened 13 years ago by ppisar.

When git commit is performed, git server sends e-mail to package owner and to scm-commits@l.f.o. If commiter display names contains non-ASCII characters, the Subject: header in sent e-mail is mall-formed.

Example with "Marcela Mašláňová":
{{{
00000000 53 75 62 6a 65 63 74 3a 20 5b 70 65 72 6c 2d 70 |Subject: [perl-p|
00000010 69 70 5d 20 2a 20 57 65 64 20 41 75 67 20 20 34 |ip] * Wed Aug 4|
00000020 20 32 30 31 30 20 4d 61 72 63 65 6c 61 20 4d 61 | 2010 Marcela Ma|
00000030 c5 a1 6c c3 a1 c5 6f 76 c3 a1 20 3c 6d 6d 61 73 |..l...ov.. <mmas| 00000040 6c 61 6e 6f 40 72 65 64 68 61 74 2e 63 6f 6d 3e |lano@redhat.com>|
00000050 20 31 2e 31 36 2d 33 20 2d 20 72 65 6e 61 6d 65 | 1.16-3 - rename|
00000060 20 70 69 70 20 74 6f 20 70 65 72 6c 2d 70 69 70 | pip to perl-pip|
00000070 20 62 65 63 61 75 73 0a | becaus.|
00000078
}}}

Decoded as UTF-8:
{{{Subject: [perl-pip] * Wed Aug 4 2010 Marcela Mašlá�vá mmaslano@redhat.com 1.16-3 - rename pip to perl-pip becaus}}}

As you can see the header value (1) does not declares non-ASCII encoding, and (2) is not valid UTF-8 at all (broken "ň" character).

According RFC defining e-mail, the header value must be US-ASCII. If other characters should be transported, special prologue declaring character set must be embedded and the string must be encoded into US-ASCII alpahabet (e.g. quoted-printable).


Broken e-mail from git server
mmaslano.mbox

Hrm, it seems to only be the subject line, the body seems to render OK:

http://lists.fedoraproject.org/pipermail/scm-commits/2010-August/474840.html

We use the email hook that gimp uses, will have to take it up with them.

Does this still happen?

I don't see any examples where we have that kind of thing in subject off hand.

It doesn't anymore. It works for longer time correctly.

I see, the commiter names are not in subjects anymore. I will do a commit with non-ASCII character in a package.

Can you please try again now?

I've changed a list default to utf-8 and it might help in this case.

The mail in web archive http://lists.fedoraproject.org/pipermail/scm-commits/2011-November/678428.html looks good, but this is ''NOT'' a fix. The Subject: header in the e-mail (Message-Id: <!20111101115947.C2FC921437@pkgs01.phx2.fedoraproject.org>) is still invalid:

{{{
$ grep '^Subject:' /tmp/perl-MogileFS-Utils.mail |hexdump -C
00000000 53 75 62 6a 65 63 74 3a 20 5b 70 65 72 6c 2d 4d |Subject: [perl-M|
00000010 6f 67 69 6c 65 46 53 2d 55 74 69 6c 73 5d 20 46 |ogileFS-Utils] F|
00000020 69 78 20 74 79 70 6f 20 28 c5 a0 c3 ad 6c 65 6e |ix typo (....len|
00000030 c4 20 c5 be 6c 75 c5 a5 6f 75 c4 6b c3 bd 20 6b |. ..lu..ou.k.. k|
00000040 c5 af c5 20 c3 ba 70 c4 6c 20 c4 c3 a1 62 c4 6c |... ..p.l ...b.l|
00000050 73 6b c3 a9 20 c3 b3 64 79 2e 20 e3 e3 af e3 e3 |sk.. ..dy. .....|
00000060 e3 a7 e3 e3 a9 ef bc 29 0a |.......).|
00000069
}}}

The mail client on git server ''must'' implement [http://tools.ietf.org/html/rfc2047 RFC 2047]. (In other words I still get broken e-mails to my mail box.)

Is there any improvement now?

No improvement. Tested with http://lists.fedoraproject.org/pipermail/scm-commits/2012-October/886326.html right now:

{{{
$ grep '^Subject:' /tmp/commitmail | hexdump -C
00000000 53 75 62 6a 65 63 74 3a 20 5b 70 65 72 6c 2d 4d |Subject: [perl-M|
00000010 6f 67 69 6c 65 46 53 2d 55 74 69 6c 73 5d 20 4d |ogileFS-Utils] M|
00000020 6f 64 65 72 6e 69 7a 65 20 73 70 65 63 20 66 69 |odernize spec fi|
00000030 6c 65 20 28 c5 a0 c3 ad 6c 65 6e c4 9b 20 c5 be |le (....len.. ..|
00000040 6c 75 c5 a5 6f 75 c4 8d 6b c3 bd 20 6b c5 af c5 |lu..ou..k.. k...|
00000050 88 20 c3 ba 70 c4 9b 6c 20 c4 8f c3 a1 62 c4 9b |. ..p..l ....b..|
00000060 6c 73 6b c3 a9 20 c3 b3 64 79 2e 20 20 e3 81 8a |lsk.. ..dy. ...|
00000070 e3 81 af e3 82 88 e3 83 95 e3 82 a7 e3 83 89 e3 |................|
00000080 83 a9 ef 0a |....|
00000084
}}}

Even it's worse because now the last character is damaged in the bitstream.

Pushed an update just now. Could you give it another try?

http://lists.fedoraproject.org/pipermail/scm-commits/2012-October/888075.html:

{{{
$ grep '^Subject:' /tmp/commitmail | hexdump -C
00000000 53 75 62 6a 65 63 74 3a 20 5b 70 65 72 6c 2d 4d |Subject: [perl-M|
00000010 6f 67 69 6c 65 46 53 2d 55 74 69 6c 73 5d 20 4d |ogileFS-Utils] M|
00000020 69 6e 69 6d 69 7a 65 20 63 6f 6d 6d 65 6e 74 73 |inimize comments|
00000030 20 28 c5 a0 c3 ad 6c 65 6e c4 9b 20 c5 be 6c 75 | (....len.. ..lu|
00000040 c5 a5 6f 75 c4 8d 6b c3 bd 20 6b c5 af c5 88 20 |..ou..k.. k.... |
00000050 c3 ba 70 c4 9b 6c 20 c4 8f c3 a1 62 c4 9b 6c 73 |..p..l ....b..ls|
00000060 6b c3 a9 20 c3 b3 64 79 2e 20 e3 81 8a e3 81 af |k.. ..dy. ......|
00000070 e3 82 88 e3 83 95 e3 82 a7 e3 83 89 e3 83 a9 ef |................|
00000080 bc 81 29 0a |..).|
00000084
}}}

We are back at comment #7. The bitstream is clean but it's still invalid e-mail header because it's not encoded.

Okay -- try again. I just fixed a problem with the logic for detecting if there were non-ascii characters in the subject.

When pushing commits to git server, I get following python cry:

{{{
$ git push && fedpkg build
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 606 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Traceback (most recent call last):
remote: File "./hooks/post-receive-chained.d/post-receive-email", line 941, in <module>
remote: main()
remote: File "./hooks/post-receive-chained.d/post-receive-email", line 937, in main
remote: change.send_emails()
remote: File "./hooks/post-receive-chained.d/post-receive-email", line 201, in send_emails
remote: self.send_extra_emails()
remote: File "./hooks/post-receive-chained.d/post-receive-email", line 364, in send_extra_emails
remote: oldrev=parent, newrev=commit.id)
remote: File "./hooks/post-receive-chained.d/post-receive-email", line 135, in generate_header
remote: subject = Header(to_bytes(subject), 'utf-8').encode()
remote: File "/usr/lib64/python2.6/email/header.py", line 176, in init
remote: self.append(s, charset, errors)
remote: File "/usr/lib64/python2.6/email/header.py", line 260, in append
remote: ustr = unicode(s, incodec, errors)
remote: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 120-121: unexpected end of data
remote: Emitting a message to the fedmsg bus.
To ssh://ppisar@pkgs.fedoraproject.org/perl-MogileFS-Utils
1f07133..31c0ea0 master -> master
Building perl-MogileFS-Utils-2.26-3.fc19 for rawhide
Created task: 4620728
Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=4620728
}}}

And no e-mail arrived. However the build succeeded.

Okay, fixed. Hopefully, third time's the charm.

Note, however, that traceback occurs when the log message has characters that aren't utf-8. Those characters will be mangled in the output now (there's no way to get it right as we aren't passing along any information about what encoding the bytes may be in.)

I have no idea why the log message wouldn't be utf-8 either. Looking at the latest log message in the git log for perl-MogileFS-Utils, the log message looks like proper utf-8 there.

Please reopen if this there's still problems.

Great, it works http://lists.fedoraproject.org/pipermail/scm-commits/2012-October/888075.html now:

{{{
Subject: =?utf-8?q?=5Bperl-MogileFS-Utils=5D_Better_comment_=28=C5=A0=C3=ADlen?=
=?utf-8?b?xJsgxb5sdcWlb3XEjWvDvSBrxa/FiCDDunDEm2wgxI/DoWLEm2xza8OpIMOz?=
=?utf-8?b?ZHkuIOOBiuOBr+OCiOODleOCp+ODieODqe+8gSk=?=
}}}

Thank you.

Metadata Update from @ppisar:
- Issue assigned to jkeating

7 years ago

Login to comment on this ticket.

Metadata