#3465 sssd_be memory leak on EL6 and EL7
Closed: wontfix 2 years ago by pbrezina. Opened 4 years ago by scoobydooxp.

I was working with lslebodn on IRC (freenode #sssd) and even after installing the latest LTM release 1.13.4 on EL6, sssd_be is leaking memory on systems with enumeration disabled and regular lookups going on. lslebodn mentioned: quick check of the valgrind log and it seems that 33MiB are leaked in cyrus sasl code

I have this issue on both EL6 (6.9) and EL7 (7.3). Attached is 2 hours of valgrind output of SSSD version 1.13.4 running on EL6. sssd on this server was last restarted on the evening of August 4th (less than 4 days ago) and sssd_be is currently using 125MiB of memory.
valgrind_64systems.com_5394.log


Do you have also the debug logs from sssd? One thing I'm stumped about is why is everyone not complainig about this..so I'd like to check the domain logs to see if there are maybe any failures during operation that are triggering the leak?

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.15.4

4 years ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1482252

4 years ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1482252

4 years ago

I checked the logs a bit and the only thing that jumps at me is this:

(Fri Aug  4 13:00:03 2017) [sssd[be[64systems.com]]] [sdap_parse_entry] (0x1000): OriginalDN: [OU=File Servers,OU=Production,OU=DC Computers,DC=64systems,DC=com].
(Fri Aug  4 13:00:03 2017) [sssd[be[64systems.com]]] [sdap_parse_entry] (0x1000): Entry has no attributes [0(Success)]!?

Which also correlates with valgrind complaining that the leaks start at .sdap_parse_entry. But to be honest, I don't see the issue following that codepath. Anyway, it's worth a test. Could you please run sssd with:

ad_gpo_access_control = disabled

added into the domain section for a test? This would disable the code that is causing that error message completely, so we would know if it was really the culprit.

I have added that to sssd.conf and restarted sssd. I will report back in a few days on memory use.

Metadata Update from @jhrozek:
- Issue priority set to: major

4 years ago

ad_gpo_access_control = disabled does not seem to be helping. sssd_be is already using > 60mb of ram in less than 48 hours.

ad_gpo_access_control = disabled does not seem to be helping. sssd_be is already using > 60mb of ram in less than 48 hours.

OK, please leave ad_gpo_access_control = disabled in sssd.conf and run sssd_be with valgrind one more time. And then attach sssd_domain log with high debug_level + valgrind log after few hours (even 24+ :-). It would be good to also maste output of "ps" for sssd_be 1 minute after start and before stopping sssd. I hope it will help to find a reason of leak.

It has been a busy Monday so I just got this going. I will leave it for as long as I can.

1 minute ps -aux|grep sssd_be
root 14913 10.6 1.5 491736 122520 ? S 13:02 0:07 /usr/bin/valgrind -v --leak-check=full --show-reachable=yes --log-file=/var/log/sssd/valgrind_64systems.com_%p.log /usr/libexec/sssd/sssd_be --domain 64systems.com --uid 0 --gid 0 --debug-to-files

ps -aux after about 18 hours:

root 14913 3.0 2.4 587376 195176 ? S Aug21 34:11 /usr/bin/valgrind -v --leak-check=full --show-reachable=yes --log-file=/var/log/sssd/valgrind_64systems.com_%p.log /usr/libexec/sssd/sssd_be --domain 64systems.com --uid 0 --gid 0 --debug-to-files

Any idea what the max upload limit is? It seems to fail with an 11MB file sssd log and <1MB valgrind log.

Sorry that was before my coffee. its 120mb. I gzipped it and its much smaller.

I haven't forgotten about this ticket, but despite trying to reproduce a similar error locally and poking around the code, I couldn't reproduce or find the issue. I asked the other sssd developers for help on our devel list: https://lists.fedorahosted.org/archives/list/sssd-devel@lists.fedorahosted.org/thread/NEZCJ54IQ77DO6FPIWGNBTJUEFVAG6UW/

Do you remember if the memory growth was as bad with the disabled GPO processing as before? Does the memory consumption baloon indefinitely or does it stabilize after some time?

In all our testing, sssd did not baloon, but the memory usage increased and then stayed the same.

For now, I'm moving the ticket to 'patches welcome', what would be welcome in this case in addition to a patch would be a way to reproduce the problem or narrow down the reproducer so that we can fix it..

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD Patches welcome (was: SSSD 1.15.4)

4 years ago

Thank you for taking time to submit this request for SSSD. Unfortunately this issue was not given priority and the team lacks the capacity to work on it at this time.

Given that we are unable to fulfill this request I am closing the issue as wontfix.

If the issue still persist on recent SSSD you can request re-consideration of this decision by reopening this issue. Please provide additional technical details about its importance to you.

Thank you for understanding.

Metadata Update from @pbrezina:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

2 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4491

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata
Attachments 4