Issue #3465: sssd_be memory leak on EL6 and EL7 - sssd

SSSD / sssd

#3465 sssd_be memory leak on EL6 and EL7

Closed: wontfix 4 years ago by pbrezina. Opened 6 years ago by scoobydooxp.

I was working with lslebodn on IRC (freenode #sssd) and even after installing the latest LTM release 1.13.4 on EL6, sssd_be is leaking memory on systems with enumeration disabled and regular lookups going on. lslebodn mentioned: quick check of the valgrind log and it seems that 33MiB are leaked in cyrus sasl code

I have this issue on both EL6 (6.9) and EL7 (7.3). Attached is 2 hours of valgrind output of SSSD version 1.13.4 running on EL6. sssd on this server was last restarted on the evening of August 4th (less than 4 days ago) and sssd_be is currently using 125MiB of memory.

jhrozek commented 6 years ago

Do you have also the debug logs from sssd? One thing I'm stumped about is why is everyone not complainig about this..so I'd like to check the domain logs to see if there are maybe any failures during operation that are triggering the leak?

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.15.4

6 years ago

scoobydooxp commented 6 years ago

Whoops, I thought I had attached this. Here is the sssd_<domain>.log with debug set at 7.

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1482252

6 years ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1482252

6 years ago

jhrozek commented 6 years ago

Issue linked to Bugzilla: Bug 1482252

jhrozek commented 6 years ago

I checked the logs a bit and the only thing that jumps at me is this:

(Fri Aug  4 13:00:03 2017) [sssd[be[64systems.com]]] [sdap_parse_entry] (0x1000): OriginalDN: [OU=File Servers,OU=Production,OU=DC Computers,DC=64systems,DC=com].
(Fri Aug  4 13:00:03 2017) [sssd[be[64systems.com]]] [sdap_parse_entry] (0x1000): Entry has no attributes [0(Success)]!?

Which also correlates with valgrind complaining that the leaks start at .sdap_parse_entry. But to be honest, I don't see the issue following that codepath. Anyway, it's worth a test. Could you please run sssd with:

ad_gpo_access_control = disabled

added into the domain section for a test? This would disable the code that is causing that error message completely, so we would know if it was really the culprit.

scoobydooxp commented 6 years ago

I have added that to sssd.conf and restarted sssd. I will report back in a few days on memory use.

Metadata Update from @jhrozek:
- Issue priority set to: major

6 years ago

scoobydooxp commented 6 years ago

ad_gpo_access_control = disabled does not seem to be helping. sssd_be is already using > 60mb of ram in less than 48 hours.

lslebodn commented 6 years ago

ad_gpo_access_control = disabled does not seem to be helping. sssd_be is already using > 60mb of ram in less than 48 hours.

OK, please leave ad_gpo_access_control = disabled in sssd.conf and run sssd_be with valgrind one more time. And then attach sssd_domain log with high debug_level + valgrind log after few hours (even 24+ :-). It would be good to also maste output of "ps" for sssd_be 1 minute after start and before stopping sssd. I hope it will help to find a reason of leak.

scoobydooxp commented 6 years ago

It has been a busy Monday so I just got this going. I will leave it for as long as I can.

1 minute ps -aux|grep sssd_be
root 14913 10.6 1.5 491736 122520 ? S 13:02 0:07 /usr/bin/valgrind -v --leak-check=full --show-reachable=yes --log-file=/var/log/sssd/valgrind_64systems.com_%p.log /usr/libexec/sssd/sssd_be --domain 64systems.com --uid 0 --gid 0 --debug-to-files

Edited 6 years ago by scoobydooxp

scoobydooxp commented 6 years ago

ps -aux after about 18 hours:

root 14913 3.0 2.4 587376 195176 ? S Aug21 34:11 /usr/bin/valgrind -v --leak-check=full --show-reachable=yes --log-file=/var/log/sssd/valgrind_64systems.com_%p.log /usr/libexec/sssd/sssd_be --domain 64systems.com --uid 0 --gid 0 --debug-to-files

Any idea what the max upload limit is? It seems to fail with an 11MB file sssd log and <1MB valgrind log.

Edited 6 years ago by scoobydooxp

scoobydooxp commented 6 years ago

Valgrind log file

scoobydooxp commented 6 years ago

Sorry that was before my coffee. its 120mb. I gzipped it and its much smaller.

scoobydooxp commented 6 years ago

jhrozek commented 6 years ago

I haven't forgotten about this ticket, but despite trying to reproduce a similar error locally and poking around the code, I couldn't reproduce or find the issue. I asked the other sssd developers for help on our devel list: https://lists.fedorahosted.org/archives/list/sssd-devel@lists.fedorahosted.org/thread/NEZCJ54IQ77DO6FPIWGNBTJUEFVAG6UW/

Do you remember if the memory growth was as bad with the disabled GPO processing as before? Does the memory consumption baloon indefinitely or does it stabilize after some time?

jhrozek commented 6 years ago

In all our testing, sssd did not baloon, but the memory usage increased and then stayed the same.

For now, I'm moving the ticket to 'patches welcome', what would be welcome in this case in addition to a patch would be a way to reproduce the problem or narrow down the reproducer so that we can fix it..

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD Patches welcome (was: SSSD 1.15.4)

6 years ago

pbrezina commented 4 years ago

Thank you for taking time to submit this request for SSSD. Unfortunately this issue was not given priority and the team lacks the capacity to work on it at this time.

Given that we are unable to fulfill this request I am closing the issue as wontfix.

If the issue still persist on recent SSSD you can request re-consideration of this decision by reopening this issue. Please provide additional technical details about its importance to you.

Thank you for understanding.

Metadata Update from @pbrezina:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

4 years ago

pbrezina commented 4 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4491

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

SSSD Patches welcome

type

None

component

None

version

None

selected

None

testsupdated

None

patch

None

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=1482252

design_review

None

review

None

changelog

None

keywords

None

coverity

None

mark

None

blocking

None

design

None

sensitive

None

blockedby

None

feature_milestone

None

Attachments 4

valgrind_64systems.com_5394.log

Attached 6 years ago View Comment

sssd_64systems.com.log

Attached 6 years ago View Comment

valgrind_64systems.com_14913.log

Attached 6 years ago View Comment

sssd_64systems.com.log.gz

Attached 6 years ago View Comment

SSSD / sssd

Source Code

Documentation

#3465 sssd_be memory leak on EL6 and EL7 Closed: wontfix 4 years ago by pbrezina. Opened 6 years ago by scoobydooxp.

Metadata

Attachments 4

#3465 sssd_be memory leak on EL6 and EL7

Closed: wontfix 4 years ago by pbrezina. Opened 6 years ago by scoobydooxp.