#12356 Need help investigating cfp.fedoraproject.org email delivery failure to michel@michel-slm.name (pretalx.com works fine)
Closed: Upstream 17 days ago by kevin. Opened a month ago by salimma.

NOTE

If your issue is for security or deals with sensitive info please
mark it as private using the checkbox below.

Describe what you would like us to do:


I noticed over the past few months that I never got any notification for talk acceptances and schedule updates for the past two CentOS events (CentOS Showcase in November, CentOS Connect @ FOSDEM this January)

I tested changing my email to another domain two days ago, and sent myself a password reset request - and that worked. Today I changed it back to michel@michel-slm.name, sent a password reset request... and nothing again.

The system is set to only allow a password reset request every 24 hours, so this is obviously rather inconvenient to debug.

My email provider is hosted by mailbox.org, so I can try to change the email to a normal mailbox.org address tomorrow or the day after; if that works at least emails will go to the right inbox, but I'm not sure what's going wrong here.

When do you need this to be done by? (YYYY/MM/DD)


2025/01/24

Not urgent, but just flagging. This address works fine for receiving lists.fedoraproject.org emails, though for sending it triggers DMARC issues


So, we don't run the service, but we do provide a email gateway (so it can send as fedoraproject.org, etc).

I just looked at logs and don't see it sending anything to michel@michel-slm.name at all. ;(

So, I wonder if there's something in the app thats not accepting that as a valid email or something?

@jflory7 can you see if we can get more debugging or info from the app side?

So, we don't run the service, but we do provide a email gateway (so it can send as fedoraproject.org, etc).

I just looked at logs and don't see it sending anything to michel@michel-slm.name at all. ;(

Not in the last few months? :(

To help debugging, here are the last several emails I expect - plus an email reset from earlier today

New schedule!
Jan. 8, 2025, 5:58 p.m.

Your CentOS Connect talk
Dec. 19, 2024, 10:08 p.m.

Your proposal: From ELN to EPEL 10: tracking and bringing up packages with poi-tracker and ebranch
Dec. 9, 2024, 4:37 p.m.

So, I wonder if there's something in the app thats not accepting that as a valid email or something?

Yeah, I wonder. Seems something unique to our pretalx instance, as I noted pretalx.com sent out DevConf.US and .CZ emails fine to the same address

@jflory7 can you see if we can get more debugging or info from the app side?

Yes please - thanks! Let me know if I can provide any more data

@jflory7 while you're looking, there's also a bug in that the copies of the emails in the "Your Email" tab are not clickable and only show the subject and datetime stamp - both in Firefox and in Chrome - but if I highlighted them and copy paste those, the pasted text do contain the email bodies

I am not the right person to debug the service, but @misc and @duck help run this service in the Red Hat OSPO community infra. I hope one of them could hop into this ticket and help investigate what is going on in the Pretalx side.

FYI: @shaunm @jasonbrooks

It is not using fedora infra, we route it from mx1.osci.io: https://gitlab.com/osci/community-cage-infra-ansible/-/blob/master/playbooks/tenants/osci/openshift_dedicated_apps/cfp.fedoraproject.org.yml?ref_type=heads#L28

The error in our server are the following:

Jan 11 05:26:51 polly postfix/smtp[934184]: warning: no MX host for michel-slm.name has a valid address record
Jan 11 05:26:51 polly postfix/smtp[934184]: 598B827AFB: to=<michel@michel-slm.name>, relay=none, delay=235697, delays=235637/0.05/60/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=mxext3.mailbox.org type=AAAA: Host not found, try again)

Seems there is a weird DNS error affecting only that domain, I worked around for now and mail were sent, but this need to be checked on Monday in more details

Seems to be DNSsec related:

Jan 11 05:43:41 francine named[1378207]: client @0x7f21d00fde58 2620:52:3:1:5054:ff:fe5b:c14a#57299 (mxext3.mailbox.org): query failed (broken trust chain) for mxext3.mailbox.org/IN/AAAA at ../../../lib/ns/query.c:7382

Curious, because this is green on https://dnssec-analyzer.verisignlabs.com/mailbox.org

Seems to be DNSsec related:
Jan 11 05:43:41 francine named[1378207]: client @0x7f21d00fde58 2620:52:3:1:5054:ff:fe5b:c14a#57299 (mxext3.mailbox.org): query failed (broken trust chain) for mxext3.mailbox.org/IN/AAAA at ../../../lib/ns/query.c:7382

Curious, because this is green on https://dnssec-analyzer.verisignlabs.com/mailbox.org

Really curious! I definitely have MX records set, FWIW. Looking forward to hearing what you find on Monday - thanks for looking into this

Oh, and if there is something I need to file with mailbox org to ask them to fix, do let me know - as a paying customer I can probably get them to fix things. Thanks!

No, I think the issue is on our side, or at least not at the SMTP level. I suspect something either network/firewall related, or dnssec related. I am at the "get tcpdump and read dns packet" stage, shouldn't take too long after that :p

It might be related to SHA1, as that's deprecated in RHEL 9:

https://access.redhat.com/solutions/6955455

I see they still offer that in the traces I got, but they also offer SHA-512. I see dnssec error messages about it, and as soon as I add SHA-1 to the allowed crypto algos (and restart bind):

#  update-crypto-policies --set DEFAULT:SHA1

Everything work. Removing SHA-1 break it again.

I guess they still serve the old keys and that's hitting a bind corner case (eg, bind fail because it cannot understand the SHA-1 response, and do not fallback to the SHA-512 answer or something like that). I have to dig further in bind documentation, please send a rescue team if I am not back in 1 week.

There is plenty of warnings on another dnssec valdiation tool: https://dnsviz.net/d/mailbox.org/dnssec/

And indeed:

$ host -t DS mailbox.org
mailbox.org has DS record 38499 7 2 574F226E410BFBD86BDE1FD276EC9E88D778449A65BBDCF1F218E4D1 9F851312

the DS record use algorithm 7 (38499 is key tag, 7 is the algo, 2 is the digest alg (SHA-256), and the hash is the digest itself, as explained on https://www.cloudns.net/wiki/article/365/ )

7 is RSASHA1-NSEC3-SHA1, that's deprecated in RFC 8624 (point 3.1) since 2019, and RHEL 9 do not support it unless explicitly re-enabled, because RHEL 9 deprecated SHA-1, because that's insecure.

So I propose that:
- I enable SHA-1 for the time being (with a warning to disable in 1 or 2 months)
- you (@salimma) fill a bug to tell them, because if it fail for us, it will fail for others as well, as RHEL 9 is out since may 2022.

When that's fixed for them, we disable our workaround.

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

a month ago

Thanks for your help triaging this issue @misc! :thumbsup:

That all sounds good to me. I'll close this now... let us know if there's anything we can do from the fedora side.

Metadata Update from @kevin:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

17 days ago

Thank you @misc ! I'll try doing a password reset again and if it fails I'll reopen

I'll file a bug with mailbox.org asking them to stop using SHA-1 too, thanks.

(apologies for the slow reply, I was on vacation last week and frantically catching up now)

OK, ticket filed with mailbox.org - I can't link here since it's private.

Update from mailbox.org support

Thank you for your message. We have already created an internal process to remove SHA-1. However, please also note that according to the NIST, SHA-1
+may still be used until around 2030.
[1] https://www.nist.gov/news-events/news/2022/12/nist-retires-sha-1-cryptographic-algorithm

Unfortunately, I am unable to say when exactly we will switch off SHA-1. Please be patient for a moment longer.

So looks like they are working on it but no ETA - and they think it should be allowed for use until 2030 anyway, but that's not great since basically people who deploy newer OS releases and just take the default crypto policies would increasingly not be able to send mail to mailbox.org and similar domains :(

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog