#1496 Base64 encoded e-mail headers are malformed
Closed: Fixed 7 years ago Opened 7 years ago by lsedlar.

When Pagure sends a notification, the e-mail shows up as sent by the person making the action (comment, PR, etc.).

However, if the name contains non-ascii characters, the headers is partly encoded. This leads to weird display in some clients.

For example this is the From header about action made by Kamil Páral. Please note the additional @pagure.io at the end.

From: =?utf-8?q?Kamil_P=C3=A1ral_=3Cpagure=40pagure=2Eio=3E?=@pagure.io

This is how Evolution displays it (Od == From):

Snimek_z_2016-11-01_10-21-27.png


I typed up a long comment, but Pagure ate it :disappointed: #1333 strikes again. Anyway...

I wonder if this is purely a client problem. I see in Thunderbird I have an email that should exhibit this behaviour. The raw From header is From: =?utf-8?b?SmFzb24g44OG44Kj44OT44OEIDxwYWd1cmVAcGFndXJlLmlvPg==?=@pagure.io, which is definitely base64 encoded UTF-8 text:

>>> import base64
>>> x = 'SmFzb24g44OG44Kj44OT44OEIDxwYWd1cmVAcGFndXJlLmlvPg=='
>>> print(base64.b64decode(x).decode('utf-8'))
Jason ティビツ <pagure@pagure.io>

Thunderbird clearly knows how to handle this and decodes it appropriately so the UI displays the correct "From". K-9 mail on my phone doesn't and I see the raw string. I don't yet know what (if any) RFC this corresponds to so I need to do some more research, but I wonder if this should be filed against Evolution and K-9 mail. There might be a more widely-supported method to handle non-ascii "From" headers so maybe we can adjust something on the Pagure side, but I'm not sure what's happening now is wrong, per se.

Trivia: I originally used katakana for my surname in Fedora infrastructure precisely to shake out encoding bugs throughout the stack. Glad to see it still finds issues.

In any case, I'm pretty RFC047 still specifies this. It's actually an annoyingly complicated issue. Also note that any intervening agent like a mailing list might do its own mangling, just to make it a bit more fun.

@tibbs, thanks, that's exactly what I was looking for.

After reading the RFCs and re-reading the issue, it seems like the problem boils down to that extra @pagure.io.

@pingou pasted the content of both emails (https://paste.fedoraproject.org/475178/78553029/, https://paste.fedoraproject.org/475179/47855305/) as well. This is my first time digging into the guts of email, but it looks to me like the second one (with the patch) is going to make some clients very unhappy as well, since it appears to be encoded in something other than us-ascii. That seems to violate RFC822 and RFC2047. Am I interpreting these RFCs incorrectly?

After reading the RFCs and re-reading the issue, it seems like the problem boils down to that extra @pagure.io.

This confuses me a little, which extra @pagure.io?

After reading the RFCs and re-reading the issue, it seems like the problem boils down to that extra @pagure.io.

This confuses me a little, which extra @pagure.io?

From: =?utf-8?q?Kamil_P=C3=A1ral_=3Cpagure=40pagure=2Eio=3E?= is encoded using UTF-8 and RFC2047[0], and decodes to From: Kamil Páral <pagure@pagure.io>. That looks right to me. However, after that encoded-word, there is a @pagure.io again, so the end result is From: Kamil Páral <pagure@pagure.io>@pagure.io, and I think RFC2047 briefly mentions spaces between encoded-words so maybe this is what is making Evolution unhappy.

It seems you can send emails with UTF-8 encoded headers[1], but I think it needs to be marked as such with the message/global MIME type[2]. I don't know how many clients will not handle this correctly.

[0] https://tools.ietf.org/html/rfc2047#section-4.2
[1] https://tools.ietf.org/html/rfc6532
[2] http://www.iana.org/assignments/media-types/message/global

Looking at the email I sent from my local instance without the patch, it contains a trailing @carmine.pingoured.fr so the last part here is something that is added automatically using the host the machine is running on. Could it be something added by the smtp server?

So what do we do with this?

We have an issue, I could replicate it, I have a proposed PR (#1520) with which I can no longer reproduce the issue. What do we want to do then?

So what do we do with this?
We have an issue, I could replicate it, I have a proposed PR #1520) with which I can no longer reproduce the issue. What do we want to do then?

I think the question is whether or not we're going to send RFC 6532 compliant emails (which requires an RFC 6531 compliant SMTP server). Given that the policy setting was just added to the email module in Python 3.5, I don't think we really have a choice about that.

Based on my understanding of the RFCs, if we're not doing RFC 6532 emails, we shouldn't use UTF-8 encoded headers.

And so keep the situation as is? With that bug?

No, I think we need to figure out why it's appending @pagure.io to the encoded-word and fix that. I'm happy to take that on, or leave it to you.

Note that it's appending the hostname, so pagure.io on pagure, something else on your local instance.

What buggles me is why this doesn't happen when the FROM email is encoded?

I found a way to make the name email-safe using the email.Header module, without actually encoding it to utf-8, then I append the email with a space in between and from local testing it seems to behave as desired.

PR #1520 updated

@pingou changed the status to Closed

7 years ago

Login to comment on this ticket.

Metadata