#8870 Mailman API unavailable
Closed: Fixed 4 years ago by kevin. Opened 5 years ago by bcotton.

When trying to approve a moderated message in the devel-announce web interface:

Mailman REST API not available. Please start Mailman core.

I gather from #fedora-admin that this has been an ongoing issue.


It has been. We got alerted on it just before this ticket came in.

I have the service back up and running now and I have added some debugging to try and see if we can tell why it drops...

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: lists

5 years ago

For another data point, I saw same trying to approve a message on freeipa-users list around 12:15am UTC.

Hitting it now (1252 UTC 30 Apr) trying to approve a devel-announce post.

And again (2029 UTC 1 May)

This has still been happening.

One issue was that web crawlers were hitting the login pages and causing a lot of traffic. I updated the robots.txt file to keep them out and that seems to have helped a good deal, but not entirely. Will continue to look into it.

So, this service has now been up over 5 days... I'm hoping that all the crawlers that were beating it up have now gotten the new robots.txt and thus the issue is solved.

I'm going to close this now, if it alerts again I'll reopen, or if any of you notice issues please re-open.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

I just hit this issue again when trying to "Mass subscribe" two email addresses to java-maint-sig@lists.fedoraproject.org.

It looked like the service first crashed (giving me HTTP 500 errors on all my requests after ~30 seconds) and then I got this error message:

Mailman REST API not available. Please start Mailman core

However, now the server seems to be back up again, and trying to subscribe those two email addresses worked.

Metadata Update from @bcotton:
- Issue status updated to: Open (was: Closed)

5 years ago

Me, too. The admin UI either timing out with 503 or failing with an internal server error. Also see 8910

So, the problem seems to happen when you try and go to the freeotp-devel admin page or the freeotp-devel subscription requests page. :(

I don't now see any tracebacks or indications. I think it actually does complete most of the time, but it times out on our proxies.

I'm not sure how much time I will have to dig into this, so if someone else wants to look, please feel free.

Possibly related:

May 13 03:40:44 2020 (28966) 127.0.0.1 - - "GET /3.0/lists/freeotp-devel.lists.fedorahosted.org HTTP/1.1" 200 391
May 13 03:40:49 2020 (28966) deque: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/mailman/app/workflow.py", line 69, in __next__
    return step()
  File "/usr/lib/python3.4/site-packages/mailman/app/subscriptions.py", line 227, in _step_sanity_checks
    raise SubscriptionPendingError(self.mlist, self.address.email)
mailman.interfaces.subscriptions.SubscriptionPendingError
May 13 03:40:49 2020 (28966) 127.0.0.1 - - "POST /3.0/members HTTP/1.1" 409 36
May 13 03:40:49 2020 (28966) 127.0.0.1 - - "GET /3.0/lists/freeotp-devel.lists.fedorahosted.org HTTP/1.1" 200 391
May 13 03:40:49 2020 (28966) 127.0.0.1 - - "GET /3.0/lists/freeotp-devel.lists.fedorahosted.org HTTP/1.1" 200 391
May 13 03:40:52 2020 (28966) deque: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/mailman/app/workflow.py", line 69, in __next__
    return step()
  File "/usr/lib/python3.4/site-packages/mailman/app/subscriptions.py", line 227, in _step_sanity_checks
    raise SubscriptionPendingError(self.mlist, self.address.email)
mailman.interfaces.subscriptions.SubscriptionPendingError

The admin interface for the freeotp-devel mailing list is still not working for me. I got Error 503 Backend fetch failed with XID: 30361677 while trying to log in. It's turning into a major issue for the team that maintains the community arround FreeOTP.

Could you escalate the problem, please?

We are all pretty swamped with the datacenter move going on.

That said, perhaps @nphilipp might have some cycles to look?

Basically I see the rest interface get swamped with requests and it stops responding, but there's not much in logs as to why.

Metadata Update from @nphilipp:
- Issue assigned to nphilipp

5 years ago

I've looked into this on mailman01, but didn't find anything concrete yet. I don't think the exceptions above are related, they seem to indicate an attempt to change something on a subscription that still is pending.

@kevin have you had to restart the service lately because of this? I see one restart after the colo move, but I'm not sure if this is the reason.

I have not restarted it since the colo move...

So, I haven't seen this happen or seen any reports since the move... we have nagios back up now in the new datacenter and it shows one outage of less than a minute (it alerted, and recovered less than a minute later).

I'm going to go ahead and close this now. If anyone sees this again, please feel free to re-open...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Log in to comment on this ticket.

Metadata