#3651 Socket-activated responders generate too much logs even with low debug level
Closed: Invalid a year ago Opened a year ago by fidencio.

Logging the responder activity when it's socket-activated has to be done in a more careful way in order to avoid:
- flooding the logs;
- filling up the disk of the users;

This issue has been reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c31


I am not sure what is a purpose of this ticket. It is not clear from the description.

Could you elaborate a little bit more? Do we want to enhance a documentation? Do we want to recommend logging to journald which should prevent flooding disk?

It would be also good to explain how is it related to socket activated responders.

BTW Any application (not just sssd) with very verbose logging can flood /var/log.

Looking at the BZ, I think the problem was that this message:
[sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children

Created 20GB log.

If I understand correctly this is not even an error message so It should be logged with different debug level (TRACE-something maybe?)

The problem is that the "[sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children" may be logged every n minutes when the socket-activated responder goes down.

It would be smarter from our side if we can prevent flooding the logs with that, mainly in the case where it's an expected operation.

@lslebodn, feel free to just close the issue as invalid if you feel like doing so.

Looking at the BZ, I think the problem was that this message:
[sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children
Created 20GB log.
If I understand correctly this is not even an error message so It should be logged with different debug level (TRACE-something maybe?)

That line has just 71 bytes

echo -n "[sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children" | wc -c
71

And with default setting 5 minutes you will have almost 20 Kib in a day. So you will need 1048576 days for 20GiB log files. I agree that debug level of that message can be change. But this ticket was very confusing.

@fidencio, Do you think we should something improve in our doc page for filing useful issues?
https://docs.pagure.org/SSSD.sssd/users/reporting_bugs.html

I'm closing this issue as it's been inferred to be non useful.

Metadata Update from @fidencio:
- Issue close_status updated to: Invalid

a year ago

Metadata Update from @mzidek:
- Issue status updated to: Open (was: Closed)

a year ago

I am reopening this issue.

There is at least one problematic log message that takes unnecessary space at level where logs should be as quite as possible and we should fix that. This issue is good for tracking that.

Moreover, there is a BZ associated with this issue. We should analyze the logs attached there and see if we can identify more log messages with improper level.

Metadata Update from @mzidek:
- Issue close_status updated to: Invalid

a year ago

Metadata Update from @mzidek:
- Issue status updated to: Open (was: Closed)

a year ago

Looking at the 20 MB log sample from the BZ, there is only 1262 lines with that error message (captured over 7 days). There were much bigger chunks of messages in the log generated at different levels (some quite high levels).

So the issue may not be as sever and the user could have been running at inappropriate debug level for a long time.

Still we should take a look at the messages in [sssd[secrets]] and check if there are more messages with bad debug levels.

Unfortunately the messages being logged weren't reported properly in the BZ ticket. I apologize for the confusion. Here are the actual log messages which caused the log file to grow to 20 GB over the weekend.

(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client

This message is repeated continuously and causes the disk to fill up when the sssd_kcm.log file grows too large. I am also seeing the same behavior on a system running Fedora 27 with sssd-kcm.socket enabled.

[root@dev23 sssd]# du -hs sssd_kcm.log-20180227
8.2G sssd_kcm.log-20180227

This file contains entries from February 26th as follows.

[root@dev23 sssd]# head sssd_kcm.log-20180227
(Mon Feb 26 07:20:40 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client

@wattersmt, okay, then I do believe we can close this issue and focus on the bug reported on Fedora (which does seem to be the same).

Please, provide as much details as possible there wrt how to reproduce the issue and we'll get to that.

@mzidek. do you agree on closing this issue?

@fidencio Ok, fine by me, you can close this.

Or, I can close this :)

Metadata Update from @mzidek:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

a year ago

I'm closing this issue as it's been inferred to be non useful.

Closing bug with wrong assumption is usually not the best solution. The question were asked because it was not clear what was the intention of bug report and it was not clear what is proposed solution for fixing this issue. Moreover, I could not see in any comment that this bug report was not useful. Next time it would be good to elaborate little bit more when closing some ticket.

Login to comment on this ticket.

Metadata