On Aug 8th I run installed FreeIPA server with 389-ds-base-1.3.2.21-1.fc20 which finished OK. The instance was left running when it suddenly crashed over the weekend:
# systemctl status dirsrv@MKOSEK-FEDORA20-TEST.service dirsrv@MKOSEK-FEDORA20-TEST.service - 389 Directory Server MKOSEK-FEDORA20-TEST. Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled) Active: failed (Result: signal) since Sun 2014-08-10 11:43:40 CEST; 1 day 21h ago Main PID: 7656 (code=killed, signal=SEGV) Aug 10 11:13:11 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 2 Aug 10 11:13:11 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 3 Aug 10 11:28:11 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 1 Aug 10 11:28:12 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 2 Aug 10 11:28:12 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 3 Aug 10 11:43:12 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 1 Aug 10 11:43:13 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 2 Aug 10 11:43:13 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 3 Aug 10 11:43:40 ipa.mkosek-fedora20.test systemd[1]: dirsrv@MKOSEK-FEDORA20-TEST.service: main pro...EGV Aug 10 11:43:40 ipa.mkosek-fedora20.test systemd[1]: Unit dirsrv@MKOSEK-FEDORA20-TEST.service ente...te. Hint: Some lines were ellipsized, use -l to show in full.
Stacktrace attached.
attachment stacktrace.1407828638.txt.gz
it seems the crash is happening because getting the entry for a changenumber returns a corrputed entry, changelog trimming doesn't check for errors in that place. Is the error log still available, does the server start and could you perform a search on cn=changelog ?
No, in the error log, I only see this last entry in /var/log/dirsrv/slapd-MKOSEK-FEDORA20-TEST/errors.20140804-102138: {{{ [08/Aug/2014:11:53:02 +0200] ipapwd_pre_bind - [file prepost.c, line 1437]: kerberos principal in uid=tuser1,cn=users,cn=accounts,dc=mkosek-fedora20,dc=test is expired [08/Aug/2014:11:53:13 +0200] - Entry "uid=tuser1,cn=users,cn=accounts,dc=mkosek-fedora20,dc=test" -- attribute "random" not allowed }}}
/var/log/dirsrv/slapd-MKOSEK-FEDORA20-TEST/errors.20140804-102138
and then just starting the new DS in /var/log/dirsrv/slapd-MKOSEK-FEDORA20-TEST/errors, though with a database error: {{{ 389-Directory/1.3.2.21 B2014.219.1850 ipa.mkosek-fedora20.test:636 (/etc/dirsrv/slapd-MKOSEK-FEDORA20-TEST)
/var/log/dirsrv/slapd-MKOSEK-FEDORA20-TEST/errors
[12/Aug/2014:10:41:36 +0200] - 389-Directory/1.3.2.21 B2014.219.1850 starting up [12/Aug/2014:10:41:36 +0200] - WARNING: changelog: entry cache size 2097152B is less than db size 9732096B; We recommend to increase the entry cache size nsslapd-cachememsize. [12/Aug/2014:10:41:36 +0200] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [12/Aug/2014:10:41:38 +0200] schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=mkosek-fedora20,dc=test [12/Aug/2014:10:41:38 +0200] schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat,dc=mkosek-fedora20,dc=test [12/Aug/2014:10:41:38 +0200] schema-compat-plugin - warning: no entries set up under ou=sudoers,dc=mkosek-fedora20,dc=test [12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target ou=sudoers,dc=mkosek-fedora20,dc=test does not exist [12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=ad,cn=etc,dc=mkosek-fedora20,dc=test does not exist [12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=mkosek-fedora20,dc=test does not exist [12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=mkosek-fedora20,dc=test does not exist [12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=automember rebuild membership,cn=tasks,cn=config does not exist [12/Aug/2014:10:41:38 +0200] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=mkosek-fedora20,dc=test--no CoS Templates found, which should be added before the CoS Definition. [12/Aug/2014:10:41:38 +0200] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=mkosek-fedora20,dc=test--no CoS Templates found, which should be added before the CoS Definition. [12/Aug/2014:10:41:38 +0200] - slapd started. Listening on All Interfaces port 389 for LDAP requests [12/Aug/2014:10:41:38 +0200] - Listening on All Interfaces port 636 for LDAPS requests [12/Aug/2014:10:41:38 +0200] - Listening on /var/run/slapd-MKOSEK-FEDORA20-TEST.socket for LDAPI requests [12/Aug/2014:10:42:09 +0200] - database error -30993 [12/Aug/2014:10:42:09 +0200] - database error -30993 }}}
DS did not crash during ldapsearch: {{{
...
dn: changenumber=7837,cn=changelog objectClass: top objectClass: changelogentry objectClass: extensibleObject targetuniqueid: 72c4d509-1edf11e4-9b00fe62-30505311 changeNumber: 7837 targetDn: fqdn=ipa.mkosek-fedora20.test,cn=computers,cn=accounts,dc=mkosek-fed ora20,dc=test changeTime: 20140812084259Z changeType: modify changes:: cmVwbGFjZToga3JiTGFzdFN1Y2Nlc3NmdWxBdXRoCmtyYkxhc3RTdWNjZXNzZnVsQXV0 aDogMjAxNDA4MTIwODQzMDBaCi0KcmVwbGFjZTogbW9kaWZpZXJzbmFtZQptb2RpZmllcnNuYW1lO iBjbj1JUEEgTG9ja291dCxjbj1wbHVnaW5zLGNuPWNvbmZpZwotCnJlcGxhY2U6IG1vZGlmeXRpbW VzdGFtcAptb2RpZnl0aW1lc3RhbXA6IDIwMTQwODEyMDg0MjU5WgotCnJlcGxhY2U6IGVudHJ5dXN uCmVudHJ5dXNuOiAyMjI4MAotCgA=
search: 2 result: 0 Success
}}}
I wonder if there is not a bug in handle_getchangerecord_search. This callback function stores the entry return by the search into a temporary structure (crt->crt_entry). When returning from the internal search, get_changerecord returns that entry.
According to https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.1/html/Plug-in_Guide/Plugin_Programming_Guide-Function_Reference-slapi_search_internal_callback_pb.html, if we want to access the entry outside of the callback we need to do slapi_entry_dup.
attachment 0001-Ticket-47871-389-ds-base-1.3.2.21-1.fc20-crashed-ove.patch
attachment 0002-Ticket-47871-Test-case-389-ds-base-1.3.2.21-1.fc20-c.patch
I set up 3 freeipa instances on F20, two with the fix, one without.
Ran freeipa unit tests, then let them idle (this was the conditions when the crashed occurred). To increase the probability of crash, it trims the retro changelog each 30s and it evaluates ~ 5000 entries at each trimming.
{{{ dn: cn=Retro Changelog Plugin,cn=plugins,cn=config nsslapd-changelogmaxage: 8d nsslapd-changelog-trim-interval: 30s
The tests started Friday 22nd. No crash after 3 days.
Looks good to me.
Push to '''master'''
git merge ticket47871 Updating 1c888bf..495151b Fast-forward dirsrvtests/tickets/ticket47871_test.py | 306 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ldap/servers/plugins/retrocl/retrocl.h | 2 +- ldap/servers/plugins/retrocl/retrocl_trim.c | 163 +++++++++++++++++++++++-------------------- 3 files changed, 393 insertions(+), 78 deletions(-) create mode 100644 dirsrvtests/tickets/ticket47871_test.py
git push origin master Counting objects: 20, done. Delta compression using up to 4 threads. Compressing objects: 100% (11/11), done. Writing objects: 100% (11/11), 5.29 KiB, done. Total 11 (delta 7), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 1c888bf..495151b master -> master
commit 495151b Author: Thierry bordaz (tbordaz) tbordaz@redhat.com Date: Thu Aug 21 11:49:18 2014 +0200
Push to '''1.3.2'''
git cherry-pick 495151b [389-ds-base-1.3.2 47fbe25] Ticket 47871 - 389-ds-base-1.3.2.21-1.fc20 crashed over the weekend 3 files changed, 393 insertions(+), 78 deletions(-) create mode 100644 dirsrvtests/tickets/ticket47871_test.py
git push origin 389-ds-base-1.3.2 Counting objects: 20, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (11/11), 5.17 KiB, done. Total 11 (delta 7), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git f237812..47fbe25 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 47fbe25 Author: Thierry bordaz (tbordaz) tbordaz@redhat.com Date: Thu Aug 21 11:49:18 2014 +0200
Metadata Update from @nhosoi: - Issue assigned to tbordaz - Issue set to the milestone: 1.3.2.23
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1202
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Log in to comment on this ticket.