Learn more about these different git repos.
Other Git URLs
I was working with lslebodn on IRC (freenode #sssd) and even after installing the latest LTM release 1.13.4 on EL6, sssd_be is leaking memory on systems with enumeration disabled and regular lookups going on. lslebodn mentioned: quick check of the valgrind log and it seems that 33MiB are leaked in cyrus sasl code
I have this issue on both EL6 (6.9) and EL7 (7.3). Attached is 2 hours of valgrind output of SSSD version 1.13.4 running on EL6. sssd on this server was last restarted on the evening of August 4th (less than 4 days ago) and sssd_be is currently using 125MiB of memory. <img alt="valgrind_64systems.com_5394.log" src="/SSSD/sssd/issue/raw/f77e8e5ac499a447a5a1a83e66305c477d5517ad8e92d7bd0ca151983b3c521e-valgrind_64systems.com_5394.log" />
Do you have also the debug logs from sssd? One thing I'm stumped about is why is everyone not complainig about this..so I'd like to check the domain logs to see if there are maybe any failures during operation that are triggering the leak?
Metadata Update from @jhrozek: - Issue set to the milestone: SSSD 1.15.4
Whoops, I thought I had attached this. Here is the sssd_<domain>.log with debug set at 7.
<img alt="sssd_64systems.com.log" src="/SSSD/sssd/issue/raw/files/f7269cbdfdfc3272af2f87f14ac880e8e0b0a14de9ac873a36764c8991aae2c7-sssd_64systems.com.log" />
Metadata Update from @jhrozek: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1482252
Issue linked to Bugzilla: Bug 1482252
I checked the logs a bit and the only thing that jumps at me is this:
(Fri Aug 4 13:00:03 2017) [sssd[be[64systems.com]]] [sdap_parse_entry] (0x1000): OriginalDN: [OU=File Servers,OU=Production,OU=DC Computers,DC=64systems,DC=com]. (Fri Aug 4 13:00:03 2017) [sssd[be[64systems.com]]] [sdap_parse_entry] (0x1000): Entry has no attributes [0(Success)]!?
Which also correlates with valgrind complaining that the leaks start at .sdap_parse_entry. But to be honest, I don't see the issue following that codepath. Anyway, it's worth a test. Could you please run sssd with:
.sdap_parse_entry
ad_gpo_access_control = disabled
added into the domain section for a test? This would disable the code that is causing that error message completely, so we would know if it was really the culprit.
I have added that to sssd.conf and restarted sssd. I will report back in a few days on memory use.
Metadata Update from @jhrozek: - Issue priority set to: major
ad_gpo_access_control = disabled does not seem to be helping. sssd_be is already using > 60mb of ram in less than 48 hours.
OK, please leave ad_gpo_access_control = disabled in sssd.conf and run sssd_be with valgrind one more time. And then attach sssd_domain log with high debug_level + valgrind log after few hours (even 24+ :-). It would be good to also maste output of "ps" for sssd_be 1 minute after start and before stopping sssd. I hope it will help to find a reason of leak.
It has been a busy Monday so I just got this going. I will leave it for as long as I can.
1 minute ps -aux|grep sssd_be root 14913 10.6 1.5 491736 122520 ? S 13:02 0:07 /usr/bin/valgrind -v --leak-check=full --show-reachable=yes --log-file=/var/log/sssd/valgrind_64systems.com_%p.log /usr/libexec/sssd/sssd_be --domain 64systems.com --uid 0 --gid 0 --debug-to-files
ps -aux after about 18 hours:
root 14913 3.0 2.4 587376 195176 ? S Aug21 34:11 /usr/bin/valgrind -v --leak-check=full --show-reachable=yes --log-file=/var/log/sssd/valgrind_64systems.com_%p.log /usr/libexec/sssd/sssd_be --domain 64systems.com --uid 0 --gid 0 --debug-to-files
Any idea what the max upload limit is? It seems to fail with an 11MB file sssd log and <1MB valgrind log.
Valgrind log file
<img alt="valgrind_64systems.com_14913.log" src="/SSSD/sssd/issue/raw/files/afeb412c2c7105052343150427e7c8ab9150f2a9f5912691910d8b94c928a068-valgrind_64systems.com_14913.log" />
Sorry that was before my coffee. its 120mb. I gzipped it and its much smaller.
<img alt="sssd_64systems.com.log.gz" src="/SSSD/sssd/issue/raw/files/f745da816e5c89660c4e22df3f86a84ca3aaa060f9f07a17169f53f30e09e892-sssd_64systems.com.log.gz" />
I haven't forgotten about this ticket, but despite trying to reproduce a similar error locally and poking around the code, I couldn't reproduce or find the issue. I asked the other sssd developers for help on our devel list: https://lists.fedorahosted.org/archives/list/sssd-devel@lists.fedorahosted.org/thread/NEZCJ54IQ77DO6FPIWGNBTJUEFVAG6UW/
Do you remember if the memory growth was as bad with the disabled GPO processing as before? Does the memory consumption baloon indefinitely or does it stabilize after some time?
In all our testing, sssd did not baloon, but the memory usage increased and then stayed the same.
For now, I'm moving the ticket to 'patches welcome', what would be welcome in this case in addition to a patch would be a way to reproduce the problem or narrow down the reproducer so that we can fix it..
Metadata Update from @jhrozek: - Issue set to the milestone: SSSD Patches welcome (was: SSSD 1.15.4)
Thank you for taking time to submit this request for SSSD. Unfortunately this issue was not given priority and the team lacks the capacity to work on it at this time.
Given that we are unable to fulfill this request I am closing the issue as wontfix.
If the issue still persist on recent SSSD you can request re-consideration of this decision by reopening this issue. Please provide additional technical details about its importance to you.
Thank you for understanding.
Metadata Update from @pbrezina: - Issue close_status updated to: wontfix - Issue status updated to: Closed (was: Open)
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/4491
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.