#47581 Winsync plugin segfault during incremental backoff
Closed: wontfix None Opened 10 years ago by nhosoi.

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 6): Bug 1022500

Created attachment 815376
Stacktraces

Description of problem:
Directory server segfaults when synchronization agreement is set up with Active
Directory. Crash occurs when DS instance cannot contact AD and goes into
incremental backoff.

Version-Release number of selected component (if applicable):
389-ds-base-1.2.11.15-22.el6_4.i686

How reproducible:
always

Steps to Reproduce:
0. (run under gdb)
1. Set up winsync agreement via STARTTLS but dont import AD certificate (DS
will go into incremental backoff since certificate cannot be verified and thus
connection cannot be established)
2. Wait for a few minutes, then delete agreement
3. Set up new agreement using SSL/TLS

Error log:

[23/Oct/2013:13:39:16 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:39:16 +0200] NSMMReplicationPlugin - agmt="cn=M1toAD"
(winsync:389): Replication bind with SIMPLE auth failed: LDAP error -11
(Connect error) (TLS error -8179:Peer's Certificate issuer is not recognized.)
[23/Oct/2013:13:39:16 +0200] - slapd started.  Listening on All Interfaces port
389 for LDAP requests
[23/Oct/2013:13:39:16 +0200] - Listening on All Interfaces port 636 for LDAPS
requests
[23/Oct/2013:13:39:16 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:39:20 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:39:27 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:39:38 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:40:02 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:40:50 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:42:26 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:44:16 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:44:19 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:44:25 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:44:37 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:45:01 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:45:38 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:45:49 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:47:25 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:49:16 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:49:20 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:49:26 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:49:38 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:50:02 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:50:37 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:50:50 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:52:26 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:54:16 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:54:19 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:54:25 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:54:37 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:55:01 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:55:37 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:55:38 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:55:49 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:57:25 +0200] slapi_ldap_bind - Error: could not send startTLS
request: error -11 (Connect error) errno 0 (Success)
[23/Oct/2013:13:59:05 +0200] NSMMReplicationPlugin - agmt_delete: begin
[23/Oct/2013:14:00:21 +0200] slapi_ldap_bind - Error: could not send bind
request for id [cn=administrator,cn=users,dc=adrelm,dc=com] mech [SIMPLE]:
error -1 (Can't contact LDAP server) -8179 (Peer's Certificate issuer is not
recognized.) 115 (Operation now in progress)
[23/Oct/2013:14:00:21 +0200] NSMMReplicationPlugin - agmt="cn=M1toAD"
(winsync:636): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't
contact LDAP server) (TLS error -8179:Peer's Certificate issuer is not
recognized.)
[23/Oct/2013:14:00:24 +0200] slapi_ldap_bind - Error: could not send bind
request for id [cn=administrator,cn=users,dc=adrelm,dc=com] mech [SIMPLE]:
error -1 (Can't contact LDAP server) -8179 (Peer's Certificate issuer is not
recognized.) 115 (Operation now in progress)
[23/Oct/2013:14:00:30 +0200] slapi_ldap_bind - Error: could not send bind
request for id [cn=administrator,cn=users,dc=adrelm,dc=com] mech [SIMPLE]:
error -1 (Can't contact LDAP server) -8179 (Peer's Certificate issuer is not
recognized.) 115 (Operation now in progress)

Output from GDB:

(gdb) cont
Continuing.
[Thread 0xb20feb70 (LWP 6986) exited]
[New Thread 0xb20feb70 (LWP 10246)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xadaf7b70 (LWP 6993)]
__pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50
50        unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
(gdb) bt
#0  __pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50
#1  0x04752e63 in PR_Lock (lock=0x0)
    at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:174
#2  0x00623dd6 in event_notify (prp=0x89e1508, event=<value optimized out>)
    at ldap/servers/plugins/replication/windows_inc_protocol.c:1069
#3  0x006240a0 in windows_inc_backoff_expired (timer_fire_time=1382529637,
arg=0x89e1508)
    at ldap/servers/plugins/replication/windows_inc_protocol.c:1656
#4  0x00c12f11 in eq_call_all (arg=0x0) at ldap/servers/slapd/eventq.c:312
#5  eq_loop (arg=0x0) at ldap/servers/slapd/eventq.c:359
#6  0x04759342 in _pt_root (arg=0x87d4f60)
    at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:156
#7  0x009f4a49 in start_thread (arg=0xadaf7b70) at pthread_create.c:301
#8  0x00930aae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133

ack

do we have the same problem with backoff and regular replication, not just winsync?

Replying to [comment:2 rmeggins]:

ack

do we have the same problem with backoff and regular replication, not just winsync?

A good question... repl5_inc_protocol.c has the same coding. We'd better patch it, as well. :)
{{{
619 repl5_inc_run(Private_Repl_Protocol prp)
839 /
Set up the backoff timer to wake us up at the appropriate time /
840 if (use_busy_backoff_timer){
841 /
we received a busy signal from the consumer, wait for a while */
842 if (!busywaittime){
843 busywaittime = repl5_get_backoff_min(prp);
844 }
845 prp_priv->backoff = backoff_new(BACKOFF_FIXED, busywaittime, busywaittime);
846 } else {
847 prp_priv->backoff = backoff_new(BACKOFF_EXPONENTIAL, repl5_get_backoff_min(prp),
848 repl5_get_backoff_max(prp));
849 }
}}}

{{{
510 repl5_inc_delete(Private_Repl_Protocol prpp)
526 slapi_ch_free((void
)&(*prpp)->private);
}}}

git patch file (master) -- phase 2 (applied the WinSync change to the core replication)
0001-Ticket-47581-Winsync-plugin-segfault-during-incremen.3.patch

Reviewed by Rich (Thank you!!)

Pushed to master:
7272dbd..8eecc43 master -> master
commit d70e66b
commit 8eecc43

Pushed to 389-ds-base-1.3.2:
1056a62..962fc45 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 59846f7
commit 962fc45

Pushed to 389-ds-base-1.3.1:
00b19f3..0df4c66 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit be0e470
commit 0df4c66

Pushed to 389-ds-base-1.2.11:
85a99e9..8bfe5f5 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 91d45ff
commit 8bfe5f5

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.11.25

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/918

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata