#47311 segfault in db2ldif
Closed: wontfix None Opened 11 years ago by rcritten.

Seeing a segfault trying to do an offline backup using db2ldif.

I'm not entirely sure how I got my instance into whatever state it is in. I believe it had a replica at one point which I deleted AFTER it was already gone. This was done in the context of IPA so we create a CLEANALLRUV task to remove things.

Listing all the tasks after startup returns nothing.

# gdb /usr/sbin/ns-slapd
...
(gdb) run  db2ldif -D /etc/dirsrv/slapd-GREYOAK-COM -r -n userRoot -a /tmp/test.ldif
Starting program: /usr/sbin/ns-slapd db2ldif -D /etc/dirsrv/slapd-GREYOAK-COM -r -n userRoot -a /tmp/test.ldif
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[02/Apr/2013:13:27:13 -0400] - /etc/dirsrv/slapd-GREYOAK-COM/dse.ldif: nsslapd-maxdescriptors: nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 4096 (the current process limit).  Server will use a setting of 4096.
[02/Apr/2013:13:27:13 -0400] - Config Warning: - nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 4096 (the current process limit).  Server will use a setting of 4096.
[02/Apr/2013:13:27:13 -0400] - userRoot: entry cache size: 10485760B; db size: 344064B
[02/Apr/2013:13:27:13 -0400] - ipaca: entry cache size: 10485760B; db size: 319488B
[02/Apr/2013:13:27:13 -0400] - Total entry cache size: 20971520B; dbcache size: 10000000B; available memory size: 1001832448B
[02/Apr/2013:13:27:13 -0400] - Detected Disorderly Shutdown last time Directory Server was running, recovering database.
[New Thread 0x7f6a2f9a5700 (LWP 24949)]
[New Thread 0x7f6a2f1a4700 (LWP 24950)]
[New Thread 0x7f6a2e9a3700 (LWP 24951)]
[New Thread 0x7f6a2e1a2700 (LWP 24952)]
[02/Apr/2013:13:27:13 -0400] ldbm_usn_init - backend: userRoot (global mode)
[02/Apr/2013:13:27:13 -0400] ldbm_usn_init - backend: ipaca (global mode)
[02/Apr/2013:13:27:13 -0400] schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=greyoak,dc=com
[02/Apr/2013:13:27:13 -0400] schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat,dc=greyoak,dc=com
[02/Apr/2013:13:27:13 -0400] schema-compat-plugin - warning: no entries set up under ou=sudoers,dc=greyoak,dc=com
[New Thread 0x7f6a2d9a1700 (LWP 24953)]
[02/Apr/2013:13:27:13 -0400] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=greyoak,dc=com--no CoS Templates found, which should be added before the CoS Definition.
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: cleanAllRUV task found, resuming the cleaning of rid(3)...
[New Thread 0x7f6a2d1a0700 (LWP 24954)]
[New Thread 0x7f6a2c99f700 (LWP 24955)]
[New Thread 0x7f6a2c97e700 (LWP 24956)]
[New Thread 0x7f6a27fff700 (LWP 24957)]
[New Thread 0x7f6a277fe700 (LWP 24958)]
ldiffile: /tmp/test.ldif
[02/Apr/2013:13:27:13 -0400] - export userRoot: Processed 245 entries (100%).
[Thread 0x7f6a277fe700 (LWP 24958) exited]
[Thread 0x7f6a2c97e700 (LWP 24956) exited]
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning rid (3)...
[Thread 0x7f6a27fff700 (LWP 24957) exited]
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting to process all the updates from the deleted replica...
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be online...
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to receive all the deleted replica updates...
[Thread 0x7f6a2c99f700 (LWP 24955) exited]
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Sending cleanAllRUV task to all the replicas...
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning local ruv's...
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be cleaned...
[New Thread 0x7f6a2c99f700 (LWP 24959)]
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - changelog program - _cl5AddThread: invalid changelog state - 0
[02/Apr/2013:13:27:13 -0400] NSMMReplicationPlugin - changelog program - trigger_cl_trimming: failed to increment thread count NSPR error - 0

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f6a2c99f700 (LWP 24959)]
__GI___pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50
50        unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
Missing separate debuginfos, use: debuginfo-install audit-libs-2.2.2-2.fc18.x86_64 cyrus-sasl-gssapi-2.1.23-36.fc18.x86_64 cyrus-sasl-lib-2.1.23-36.fc18.x86_64 cyrus-sasl-md5-2.1.23-36.fc18.x86_64 cyrus-sasl-plain-2.1.23-36.fc18.x86_64 keyutils-libs-1.5.5-3.fc18.x86_64 krb5-libs-1.10.3-5.fc18.x86_64 libcom_err-1.42.5-1.fc18.x86_64 libgcc-4.7.2-8.fc18.x86_64 libicu-49.1.1-5.fc18.x86_64 libstdc++-4.7.2-8.fc18.x86_64 libuuid-2.22.1-2.4.fc18.x86_64 nspr-4.9.4-1.fc18.x86_64 nss-3.14.1-3.fc18.x86_64 nss-softokn-3.14.1-5.fc18.x86_64 nss-softokn-freebl-3.14.1-5.fc18.x86_64 nss-util-3.14.1-2.fc18.x86_64 openldap-2.4.33-3.fc18.x86_64 openssl-libs-1.0.1c-7.fc18.x86_64 pam-1.1.6-3.fc18.1.x86_64 slapi-nis-0.44-1.fc18.x86_64 sqlite-3.7.13-2.fc18.x86_64 svrcore-4.0.4-8.fc18.x86_64
(gdb) where
#0  __GI___pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50
#1  0x00007f6a390f06f9 in PR_Lock () from /lib64/libnspr4.so
#2  0x00007f6a3356265a in _cl5DoTrimming (rid=rid@entry=3)
    at ldap/servers/plugins/replication/cl5_api.c:3435
#3  0x00007f6a33562c51 in trigger_cl_trimming_thread (arg=<optimized out>)
    at ldap/servers/plugins/replication/cl5_api.c:6591
#4  0x00007f6a390f5e23 in _pt_root () from /lib64/libnspr4.so
#5  0x0000003ce4e07d15 in start_thread (arg=0x7f6a2c99f700)
    at pthread_create.c:308
#6  0x0000003ce46f246d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114

After restart the error log shows that CLEANALLRUV was attempted:

[02/Apr/2013:13:37:02 -0400] - Listening on /var/run/slapd-GREYOAK-COM.socket for LDAPI requests
[02/Apr/2013:13:37:05 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 107 (Transport endpoint is not connected)
[02/Apr/2013:13:37:11 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 107 (Transport endpoint is not connected)
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning rid (3)...
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting to process all the updates from the deleted replica...
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be online...
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to receive all the deleted replica updates...
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Sending cleanAllRUV task to all the replicas...
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning local ruv's...
[02/Apr/2013:13:37:11 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be cleaned...
[02/Apr/2013:13:37:12 -0400] NSMMReplicationPlugin - CleanAllRUV Task: failed to remove replica config (16), rid (3)
[02/Apr/2013:13:37:12 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to finish cleaning...
[02/Apr/2013:13:37:12 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Successfully cleaned rid(3).
[02/Apr/2013:13:37:23 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 107 (Transport endpoint is not connected)
[02/Apr/2013:13:37:47 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 107 (Transport endpoint is not connected)
[02/Apr/2013:13:38:35 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 107 (Transport endpoint is not connected)

I'm assuming the server was stopped when you ran db2ldif. I think the issue is that the cleanallruv task should not be starting up when running db2ldif(when the server is stopped). It's trying to trigger changelog trimming at the end of the cleaning phase, but the changelog is not initialized.

Correct. I've stopped the server so I can do a backup using db2ldif.

sending patch out for review...

git merge ticket47311
Updating 7d26ba1..17d0158
Fast-forward
ldap/servers/plugins/replication/repl5.h | 1 +
ldap/servers/plugins/replication/repl5_init.c | 6 ++++++
ldap/servers/plugins/replication/repl5_replica.c | 3 ++-
3 files changed, 9 insertions(+), 1 deletions(-)

git push origin master
Counting objects: 17, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 1.19 KiB, done.
Total 9 (delta 7), reused 0 (delta 0)
Auto packing the repository for optimum performance.
To ssh://git.fedorahosted.org/git/389/ds.git
7d26ba1..17d0158 master -> master

commit 17d0158

Metadata Update from @mreynolds:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.1

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/648

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata