Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 6): Bug 1022500
Created attachment 815376 Stacktraces Description of problem: Directory server segfaults when synchronization agreement is set up with Active Directory. Crash occurs when DS instance cannot contact AD and goes into incremental backoff. Version-Release number of selected component (if applicable): 389-ds-base-1.2.11.15-22.el6_4.i686 How reproducible: always Steps to Reproduce: 0. (run under gdb) 1. Set up winsync agreement via STARTTLS but dont import AD certificate (DS will go into incremental backoff since certificate cannot be verified and thus connection cannot be established) 2. Wait for a few minutes, then delete agreement 3. Set up new agreement using SSL/TLS Error log: [23/Oct/2013:13:39:16 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:39:16 +0200] NSMMReplicationPlugin - agmt="cn=M1toAD" (winsync:389): Replication bind with SIMPLE auth failed: LDAP error -11 (Connect error) (TLS error -8179:Peer's Certificate issuer is not recognized.) [23/Oct/2013:13:39:16 +0200] - slapd started. Listening on All Interfaces port 389 for LDAP requests [23/Oct/2013:13:39:16 +0200] - Listening on All Interfaces port 636 for LDAPS requests [23/Oct/2013:13:39:16 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:39:20 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:39:27 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:39:38 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:40:02 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:40:50 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:42:26 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:44:16 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:44:19 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:44:25 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:44:37 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:45:01 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:45:38 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:45:49 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:47:25 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:49:16 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:49:20 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:49:26 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:49:38 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:50:02 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:50:37 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:50:50 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:52:26 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:54:16 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:54:19 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:54:25 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:54:37 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:55:01 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:55:37 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:55:38 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:55:49 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:57:25 +0200] slapi_ldap_bind - Error: could not send startTLS request: error -11 (Connect error) errno 0 (Success) [23/Oct/2013:13:59:05 +0200] NSMMReplicationPlugin - agmt_delete: begin [23/Oct/2013:14:00:21 +0200] slapi_ldap_bind - Error: could not send bind request for id [cn=administrator,cn=users,dc=adrelm,dc=com] mech [SIMPLE]: error -1 (Can't contact LDAP server) -8179 (Peer's Certificate issuer is not recognized.) 115 (Operation now in progress) [23/Oct/2013:14:00:21 +0200] NSMMReplicationPlugin - agmt="cn=M1toAD" (winsync:636): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) (TLS error -8179:Peer's Certificate issuer is not recognized.) [23/Oct/2013:14:00:24 +0200] slapi_ldap_bind - Error: could not send bind request for id [cn=administrator,cn=users,dc=adrelm,dc=com] mech [SIMPLE]: error -1 (Can't contact LDAP server) -8179 (Peer's Certificate issuer is not recognized.) 115 (Operation now in progress) [23/Oct/2013:14:00:30 +0200] slapi_ldap_bind - Error: could not send bind request for id [cn=administrator,cn=users,dc=adrelm,dc=com] mech [SIMPLE]: error -1 (Can't contact LDAP server) -8179 (Peer's Certificate issuer is not recognized.) 115 (Operation now in progress) Output from GDB: (gdb) cont Continuing. [Thread 0xb20feb70 (LWP 6986) exited] [New Thread 0xb20feb70 (LWP 10246)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xadaf7b70 (LWP 6993)] __pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50 50 unsigned int type = PTHREAD_MUTEX_TYPE (mutex); (gdb) bt #0 __pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50 #1 0x04752e63 in PR_Lock (lock=0x0) at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:174 #2 0x00623dd6 in event_notify (prp=0x89e1508, event=<value optimized out>) at ldap/servers/plugins/replication/windows_inc_protocol.c:1069 #3 0x006240a0 in windows_inc_backoff_expired (timer_fire_time=1382529637, arg=0x89e1508) at ldap/servers/plugins/replication/windows_inc_protocol.c:1656 #4 0x00c12f11 in eq_call_all (arg=0x0) at ldap/servers/slapd/eventq.c:312 #5 eq_loop (arg=0x0) at ldap/servers/slapd/eventq.c:359 #6 0x04759342 in _pt_root (arg=0x87d4f60) at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:156 #7 0x009f4a49 in start_thread (arg=0xadaf7b70) at pthread_create.c:301 #8 0x00930aae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
git patch file (master) 0001-Ticket-47581-Winsync-plugin-segfault-during-incremen.patch
git patch file (389-ds-base-1.2.11) 0001-Ticket-47581-Winsync-plugin-segfault-during-incremen.2.patch
ack
do we have the same problem with backoff and regular replication, not just winsync?
Replying to [comment:2 rmeggins]:
ack do we have the same problem with backoff and regular replication, not just winsync?
A good question... repl5_inc_protocol.c has the same coding. We'd better patch it, as well. :) {{{ 619 repl5_inc_run(Private_Repl_Protocol prp) 839 / Set up the backoff timer to wake us up at the appropriate time / 840 if (use_busy_backoff_timer){ 841 / we received a busy signal from the consumer, wait for a while */ 842 if (!busywaittime){ 843 busywaittime = repl5_get_backoff_min(prp); 844 } 845 prp_priv->backoff = backoff_new(BACKOFF_FIXED, busywaittime, busywaittime); 846 } else { 847 prp_priv->backoff = backoff_new(BACKOFF_EXPONENTIAL, repl5_get_backoff_min(prp), 848 repl5_get_backoff_max(prp)); 849 } }}}
{{{ 510 repl5_inc_delete(Private_Repl_Protocol prpp) 526 slapi_ch_free((void )&(*prpp)->private); }}}
git patch file (master) -- phase 2 (applied the WinSync change to the core replication) 0001-Ticket-47581-Winsync-plugin-segfault-during-incremen.3.patch
Reviewed by Rich (Thank you!!)
Pushed to master: 7272dbd..8eecc43 master -> master commit d70e66b commit 8eecc43
Pushed to 389-ds-base-1.3.2: 1056a62..962fc45 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit 59846f7 commit 962fc45
Pushed to 389-ds-base-1.3.1: 00b19f3..0df4c66 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit be0e470 commit 0df4c66
Pushed to 389-ds-base-1.2.11: 85a99e9..8bfe5f5 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 91d45ff commit 8bfe5f5
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.2.11.25
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/918
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.