https://bugzilla.redhat.com/show_bug.cgi?id=878111 (Red Hat Enterprise Linux 6)
Description of problem: 389-ds segfaults if it cannot rename the logs (due to wrong permission on the parent directory). /etc/init.d/dirsrv start PKI-IPA Starting dirsrv: ns-slapd: Failed to rename errors log file, Netscape Portable Runtime error -5966 (Access Denied.). Exiting... ns-slapd: Failed to rename errors log file, Netscape Portable Runtime error -5966 (Access Denied.). Exiting... rsyslogd-2177: imuxsock begins to drop messages from pid 3554 due to rate-limiting kernel: ns-slapd[3554]: segfault at 7fffe5205adc ip 0000003aa02725b4 sp 00007fffe5205ac0 error 6 in libslapd.so.0.0.0[3aa0200000+f0000] Version-Release number of selected component (if applicable): 389-ds-base-1.2.10.2-20.el6_3.x86_64 How reproducible: Always. Steps to Reproduce: 1. chown root.root /var/log/dirsrv-<instance> 2. try to start slapd. Actual results: slapd segfaults Expected results: No segfaults Additional info: As I understood, slapd fails & segfault only if it requires log rotation, otherwise it starts normally (provided the files under /var/log/dirsrv-<instance> is writable).
I can not reproduce this on any version of 389. Here is what I am doing:
[1] Create a DS instance to run as user "dirman" [2] Turn on verbose logging to fill up error log (above 1 mb) [3] Stop DS [4] Change the max log size to 1 mb - this will force rotation at startup [5] chown root:root /var/log/dirsrv/slapd-localhost [6] chown root:root /var/log/dirsrv/slapd-localhost/* [7] start ds: service dirsrv start [8] No crash or segfault [9] Error log is properly rotated, and ownership is restored to "dirman" for all the files under /var/log/dirsrv/slapd-localhost
Reproduced issue. We are getting into an infinite loop which leads to a stack overflow.
When we go to write to the error we see that it needs to be rotated, but since the permission are wrong we fail to rename the file. So we try and log a message that we can not rename the log, but trying to log this new message triggers the log rotation again, which fails, and we just keep looping and overflow the stack.
The fix is to log a message, without doing the rotation check.
attachment 0001-Ticket-527-ns-slapd-segfaults-if-it-cannot-rename-th.patch
git merge ticket527 Updating e3aac66..4e9aab8 Fast-forward ldap/servers/slapd/log.c | 63 ++++++++++++++++++++++++++++++++++++++------- 1 files changed, 53 insertions(+), 10 deletions(-)
git push origin master Counting objects: 11, done. Delta compression using up to 4 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 1.14 KiB, done. Total 6 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git e3aac66..4e9aab8 master -> master
git checkout 389-ds-base-1.2.11 Switched to branch '389-ds-base-1.2.11'
git cherry-pick -x master Finished one cherry-pick. [389-ds-base-1.2.11 85261ef] Ticket 527 - ns-slapd segfaults if it cannot rename the logs 1 files changed, 53 insertions(+), 10 deletions(-)
git push origin 389-ds-base-1.2.11 Counting objects: 11, done. Delta compression using up to 4 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 1.18 KiB, done. Total 6 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 39b0938..85261ef 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
Metadata Update from @mreynolds: - Issue assigned to mreynolds - Issue set to the milestone: 1.2.11.17
I have several brand new installations that are experiencing this behavior. Using 389-ds-base 1.3.9.1.10.el7. Is there any corrective action associated with this behavior?
Are you actually seeing this error: "Failed to rename errors log file" and then the server crashes?
This bug was fixed 6 years ago, and no one has complained about it since. If it's crashing at startup please follow this doc on how to get as stack trace so we can see what is going on:
http://www.port389.org/docs/389ds/FAQ/faq.html#debug-startup-crashes
Thanks!
Metadata Update from @mreynolds: - Issue close_status updated to: None (was: Fixed) - Issue set to the milestone: None (was: 1.2.11.17)
I will check logs for that specific error. They're definitely dying at log rotation with "access denied" error message. The only way I can start the server is to delete the error log first, otherwise it just hangs at "starting up".
status-dirsrv shows a number of these errors:
Oct 12 00:00:00 patchqaldap02s.eng.c4e.ops ns-slapd[22846]: Failed to open errors log file /var/log/dirsrv/slapd-patchqaldap02s/errors: error 13 (Permission denied); Exiting... Oct 12 00:00:00 patchqaldap02s.eng.c4e.ops ns-slapd[22846]: Failed to reopen errors log file, Netscape Portable Runtime error -5966 (Access Denied.)
Thanks for the link. I'll follow it accordingly.
I will check logs for that specific error. They're definitely dying at log rotation with "access denied" error message. The only way I can start the server is to delete the error log first, otherwise it just hangs at "starting up". status-dirsrv shows a number of these errors: Oct 12 00:00:00 patchqaldap02s.eng.c4e.ops ns-slapd[22846]: Failed to open errors log file /var/log/dirsrv/slapd-patchqaldap02s/errors: error 13 (Permission denied); Exiting... Oct 12 00:00:00 patchqaldap02s.eng.c4e.ops ns-slapd[22846]: Failed to reopen errors log file, Netscape Portable Runtime error -5966 (Access Denied.)
A few more questions:
What are the permissions of the logs directory and all the logs? Can you grab a quick "ls -la"? Is the server running as "dirsrv"? What platform are you running on? If you disable selinux does it work? Does it fail to start after every single restart?
I will check logs for that specific error. They're definitely dying at log rotation with "access denied" error message. The only way I can start the server is to delete the error log first, otherwise it just hangs at "starting up". status-dirsrv shows a number of these errors: Oct 12 00:00:00 patchqaldap02s.eng.c4e.ops ns-slapd[22846]: Failed to open errors log file /var/log/dirsrv/slapd-patchqaldap02s/errors: error 13 (Permission denied); Exiting... Oct 12 00:00:00 patchqaldap02s.eng.c4e.ops ns-slapd[22846]: Failed to reopen errors log file, Netscape Portable Runtime error -5966 (Access Denied.) A few more questions: What are the permissions of the logs directory and all the logs? Can you grab a quick "ls -la"? Is the server running as "dirsrv"? What platform are you running on? If you disable selinux does it work? Does it fail to start after every single restart? Thanks!
A few more questions: What are the permissions of the logs directory and all the logs? Can you grab a quick "ls -la"? Is the server running as "dirsrv"? What platform are you running on? If you disable selinux does it work? Does it fail to start after every single restart? Thanks!
drwxr-x---. 4 root root 46 Oct 2 14:04 dirsrv
[root@devldap02s]# ls -la /var/log/dirsrv/ total 8 drwxr-x---. 4 dirsrv dirsrv 46 Oct 2 14:04 . drwxr-x---. 15 root log 4096 Oct 13 03:43 .. drwx------. 2 dirsrv dirsrv 31 Sep 23 14:10 admin-serv drwxr-xr-x. 2 dirsrv dirsrv 4096 Oct 14 12:02 slapd-devldap02s
[root@devldap02s]# ls -la /var/log/dirsrv/slapd-devldap02s/ total 1568 drwxr-xr-x. 2 dirsrv dirsrv 4096 Oct 14 12:02 . drwxr-x---. 4 dirsrv dirsrv 46 Oct 2 14:04 .. -rw-------. 1 dirsrv dirsrv 1534406 Oct 14 14:14 access -rw-------. 1 dirsrv dirsrv 63 Oct 14 12:02 access.rotationinfo -rw-------. 1 dirsrv dirsrv 0 Oct 14 12:02 audit -rw-------. 1 dirsrv dirsrv 63 Oct 14 12:02 audit.rotationinfo -rw-------. 1 dirsrv dirsrv 8438 Oct 14 12:02 errors -rw-------. 1 dirsrv dirsrv 63 Oct 14 12:02 errors.rotationinfo
Okay, can you get the errors log from the failed startup (before it has to be deleted)? I need to see what it's saying because everything looks fine so I have no idea how it could be inconsistently failing.
You can also try removing errors and errors.rotationinfo. Perhaps they are out of sync. They will get rewritten at a successful startup
I did save the log files on one of the servers and there's nothing in the error pertaining to the crash, just this: [10/Oct/2019:00:00:00.052636174 -0400] - ERR - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=OECAgreement" (genldap01s:389): Event update_window_opened should not occur in state wait_for_changes
It's true that it will start after deleting only the error log, I'd just been deleting everything in case it happens with the other logs aswell.
I will work on getting the core dump and will see if I can find any additional info from the error logs.
I came in today and had three of 12 servers down in the lab and one of 16 down in production. I'm having some trouble getting the core dump configured. The instructions might be a bit outdated. I'll press on and see what I can do with it. This time, I deleted only the errors.rotationinfo file and it came up on restart. There are no log entries at all at the time of the incident but that's probably because it can't open the file.
[root@genldap02s slapd-genldap02s]# ls -l total 15672 -rw-------. 1 dirsrv dirsrv 16000447 Oct 12 12:40 access -rw-------. 1 dirsrv dirsrv 63 Oct 11 12:40 access.rotationinfo -rw-------. 1 dirsrv dirsrv 0 Oct 11 12:40 audit -rw-------. 1 dirsrv dirsrv 63 Oct 11 12:40 audit.rotationinfo -rw-rw-rw-. 1 dirsrv dirsrv 28595 Oct 18 00:00 errors -rw-------. 1 dirsrv dirsrv 63 Oct 11 12:40 errors.rotationinfo
[root@genldap02s slapd-genldap02s]# cat errors.rotationinfo LOGINFO:Log file created at: 11/Oct/2019:12:40:26 (1570812026)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/527
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix
Log in to comment on this ticket.