#235 fedmsg-hub is running out of memory
Closed: Fixed 6 years ago Opened 6 years ago by kparal.

This has happened for the second time (that I know of), it's no coincidence. On all dev, stg and prod at roughly the same time fedmsg-hub stopped triggering new jobs. When you expect the available memory, the process consumes all available memory (~3GB) and all swap space (~2GB). Restarting the process "fixes" the problem.

There's clearly some memory leak. But it's not trivial to find out where it is, whether in fedmsg-hub itself or in our adjustments of it (taskotron-trigger). It can be happening for each message and growing over time, or it can happen just for some specific messages, we don't know yet.

On taskotron-stg01, I saved a core file using gdb into /root/fedmsg-hub-leak.core. Since the issue doesn't seem to be going away, we'll need to figure out what leaks in there.


Metadata Update from @kparal:
- Issue priority set to: High
- Issue tagged with: infra

6 years ago

It seems pyopenssl is to blame here:

<bowlofeggs> jcline recently fixed some kind of memory leak in pyopenssl that had some relationship to fedmsg - i wonder if that was the cause here?
<jcline> kparal, bowlofeggs, that is indeed the pyopenssl fixes. Both leaks should be fixed in pyOpenSSL 17.3.0, and I backported one of them to older releases. I can get the other one in now it's merged upstream.
<jcline> Although it looks like the maintainer actually just closed all my PRs without accepting them so...
<jbowen> Good <time of day>
<jbowen> Is there any chance I can get membership approval for the QA group (FAS username is the same as my IRC username)
<jcline> kparal, is this consumer on f25?
<tflink> jcline: yeah, it is
<jcline> tflink, ah okay, thanks
* jcline backports patches further :(

Meeting outcome:

  * to fix the fedmsg memory issue, the plan is to use side-builds in
    the short term if needed, upgrade the master to f26 in the medium
    term and upgrade everything to f26 in the longer term  (tflink,
    14:36:52)

https://meetbot.fedoraproject.org/fedora-meeting-1/2017-09-18/fedora-qadevel.2017-09-18-14.01.log.html

The fix was deployed to prod, stg is not updated yet (due to login issues).

Deployed even to stg, closing.

Metadata Update from @kparal:
- Issue assigned to kparal
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata