#47787 A replicated MOD fails (Unwilling to perform) if it targets a tombstone
Closed: Fixed None Opened 5 years ago by tbordaz.

The test case is the following:
1- Create M1-M2
2- Create an entry
3- Disable replication M1<-->M2
4- Delete the entry on M1
5- Modify the entry on M2
6- Enable replication

The MOD (5) fails when replicated on M1, so the test entry differs on M1 and M2.
This is likely a side effect of Ticket 47396.

Test case attached


* testing this fix with freeIPA unit tests (see https://fedorahosted.org/freeipa/ticket/4279#comment:6 for freeIPA tests) * those tests are immediately failing because of the need of 47721 so I need to cumulate the fixes {{{ 85106f0 Ticket 47721 - Schema Replication Issue (follow up) 96ab39f Ticket 47721 - Schema Replication Issue (follow up + cleanup) 6ebea73 Ticket 47721 - Schema Replication Issue 30f47c5 Ticket 47676 : (cont.) Replication of the schema fails 'master branch' -> 1.2.11 or 1.3.1 9558f55 Ticket 47676 : Replication of the schema fails 'master branch' -> 1.2.11 or 1.3.1 0ba131b Ticket 47541 - Fix Jenkins errors adce8c6 Ticket 47541 - Replication of the schema may overwrite consumer 'attributetypes' even if consumer definition 0d344e3 Bump version 1.3.2.17 4799be9 Ticket 47787: A replicated MOD fails (Unwilling to perform) if it targets a tombstone }}} * to do the test I backported the fixes in 1.3.2 * I hit a hang the first time I ran the test. I do not know if the hang is systematic or not. With the help of Ludwig (thanks for your tremendous help), here are the detail of the hang: The Thread 33 is waiting on the lock 115 because the page where the lock is implemented is hold by transaction 800058ad {{{ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by lockers: Locker Mode Count Status ----------------- Object --------------- ... 115 dd=122 locks held 0 write locks 0 pid/thread 12326/139830151333632 flags 0 priority 100 115 READ 1 WAIT changelog/id2entry.db page 2 ... 120 dd=111 locks held 0 write locks 0 pid/thread 12326/139830151333632 flags 0 priority 100 121 dd=110 locks held 0 write locks 0 pid/thread 12326/139830151333632 flags 0 priority 100 ... 149 dd=70 locks held 1 write locks 0 pid/thread 12326/139830151333632 flags 10 priority 100 149 READ 1 HELD userRoot/numsubordinates.db handle 0 ... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- ... 800058ad WRITE 1 HELD changelog/id2entry.db page 2 115 READ 1 WAIT changelog/id2entry.db page 2 ... 800058ad dd= 1 locks held 15 write locks 11 pid/thread 12326/139829938759424 flags 0 priority 100 800058ad READ 2 HELD userRoot/member.db page 23 800058ad WRITE 1 HELD changelog/numsubordinates.db page 1 800058ad WRITE 1 HELD changelog/id2entry.db page 2 800058ad WRITE 1 HELD changelog/entryrdn.db page 33 800058ad WRITE 2 HELD changelog/entryrdn.db page 35 800058ad WRITE 1 HELD changelog/ancestorid.db page 1 800058ad WRITE 1 HELD changelog/entryusn.db page 4 800058ad WRITE 1 HELD changelog/parentid.db page 1 800058ad WRITE 1 HELD changelog/nsuniqueid.db page 10 800058ad WRITE 1 HELD changelog/changenumber.db page 4 800058ad WRITE 3 HELD changelog/objectclass.db page 1 800058ad WRITE 1 HELD changelog/id2entry.db page 147 800058ad READ 2 HELD changelog/entryrdn.db page 35 800058ad READ 3 HELD changelog/entryrdn.db page 33 800058ad READ 1 HELD changelog/nsuniqueid.db page 10 }}} The transaction 800058ad is triggered by thread 8 {{{ Thread 8 (Thread 0x7f2cb1d42700 (LWP 12366)): #0 0x00007f2ce0025c7d in __db_tas_mutex_unlock () from /lib64/libdb-5.3.so #1 0x00007f2ce00e7f06 in __dbc_close () from /lib64/libdb-5.3.so #2 0x00007f2ce00e8c98 in __dbc_cleanup () from /lib64/libdb-5.3.so #3 0x00007f2ce00e8f5a in __dbc_iget () from /lib64/libdb-5.3.so #4 0x00007f2ce00f80b4 in __dbc_get_pp () from /lib64/libdb-5.3.so #5 0x00007f2cdcc811be in _entryrdn_get_elem (cursor=cursor@entry=0x7f2ce9c120a0, key=key@entry=0x7f2cb1d321d0, data=data@entry=0x7f2cb1d32200, comp_key=0x7f2ce9f14240 "dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", elem=elem@entry=0x7f2cb1d342e8) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:1783 #6 0x00007f2cdcc837e9 in _entryrdn_index_read (be=be@entry=0x7f2ce98f65f0, cursor=0x7f2ce9c120a0, srdn=srdn@entry=0x7f2cb1d342f0, elem=elem@entry=0x7f2cb1d342e8, parentelem=parentelem@entry=0x0, childelems=childelems@entry=0x0, flags=flags@entry=0, db_txn=0x7f2cea0d7b10) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:3158 #7 0x00007f2cdcc856b2 in entryrdn_index_read_ext (be=be@entry=0x7f2ce98f65f0, sdn=sdn@entry=0x7f2cea121e50, id=id@entry=0x7f2cb1d34384, flags=flags@entry=0, txn=txn@entry=0x7f2cb1d34560) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:439 #8 0x00007f2cdcc57d79 in dn2entry_ext (be=be@entry=0x7f2ce98f65f0, sdn=sdn@entry=0x7f2cea121e50, txn=txn@entry=0x7f2cb1d34560, flags=flags@entry=0, err=err@entry=0x7f2cb1d3443c) at ldap/servers/slapd/back-ldbm/dn2entry.c:92 #9 0x00007f2cdcc5aa51 in find_entry_internal_dn (flags=0, txn=0x7f2cb1d34560, lock=0, sdn=0x7f2cea121e50, be=0x7f2ce98f65f0, pb=0x7f2cea0daa60) at ldap/servers/slapd/back-ldbm/findentry.c:130 #10 find_entry_internal (pb=pb@entry=0x7f2cea0daa60, be=0x7f2ce98f65f0, lock=lock@entry=0, txn=txn@entry=0x7f2cb1d34560, flags=flags@entry=0, addr=<optimized out="">, addr=<optimized out="">) at ldap/servers/slapd/back-ldbm/findentry.c:293 #11 0x00007f2cdcc5ad3e in find_entry (pb=pb@entry=0x7f2cea0daa60, be=<optimized out="">, addr=<optimized out="">, txn=txn@entry=0x7f2cb1d34560) at ldap/servers/slapd/back-ldbm/findentry.c:313 #12 0x00007f2cdcc92183 in ldbm_back_search (pb=0x7f2cea0daa60) at ldap/servers/slapd/back-ldbm/ldbm_search.c:592 #13 0x00007f2ce83ad040 in op_shared_search (pb=pb@entry=0x7f2cea0daa60, send_result=send_result@entry=1) at ldap/servers/slapd/opshared.c:803 #14 0x00007f2ce83ba97e in search_internal_callback_pb (pb=0x7f2cea0daa60, callback_data=<optimized out="">, prc=0x0, psec=0x7f2cdaedd080 <backend_shr_note_entry_sdn_cb>, prec=0x0) at ldap/servers/slapd/plugin_internal_op.c:812 #15 0x00007f2cdaedec01 in backend_shr_update_references_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #16 0x00007f2cdaeec17f in map_data_foreach_map () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #17 0x00007f2cdaedca2b in backend_shr_update_references () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #18 0x00007f2cdaeddb86 in backend_shr_add_cb.part.13 () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #19 0x00007f2cdaeddce1 in backend_shr_betxn_post_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #20 0x00007f2ce83b6515 in plugin_call_func (list=0x7f2ce9adc0c0, operation=operation@entry=560, pb=pb@entry=0x7f2cea0ae700, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1489 #21 0x00007f2ce83b66c8 in plugin_call_list (pb=0x7f2cea0ae700, operation=560, list=<optimized out="">) at ldap/servers/slapd/plugin.c:1451 #22 plugin_call_plugins (pb=pb@entry=0x7f2cea0ae700, whichfunction=whichfunction@entry=560) at ldap/servers/slapd/plugin.c:413 #23 0x00007f2cdcc73b10 in ldbm_back_add (pb=0x7f2cea0ae700) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1035 #24 0x00007f2ce836157a in op_shared_add (pb=pb@entry=0x7f2cea0ae700) at ldap/servers/slapd/add.c:735 #25 0x00007f2ce8361dd3 in add_internal_pb (pb=pb@entry=0x7f2cea0ae700) at ldap/servers/slapd/add.c:434 #26 0x00007f2ce8362aa3 in slapi_add_internal_pb (pb=pb@entry=0x7f2cea0ae700) at ldap/servers/slapd/add.c:356 #27 0x00007f2cdb50bc81 in write_replog_db (newsuperior=0x0, modrdn_mods=0x0, newrdn=0x0, log_e=0x0, curtime=1398856739, flag=0, log_m=0x7f2cea0d3b80, dn=0x7f2cea0de7a0 "ou=ca,ou=requests,o=ipaca", optype=<optimized out="">, pb=<optimized out="">) at ldap/servers/plugins/retrocl/retrocl_po.c:371 #28 retrocl_postob (pb=<optimized out="">, optype=<optimized out="">) at ldap/servers/plugins/retrocl/retrocl_po.c:668 #29 0x00007f2ce83b6515 in plugin_call_func (list=0x7f2ce9ac90e0, operation=operation@entry=561, pb=pb@entry=0x7f2cb1d41ae0, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1489 #30 0x00007f2ce83b66c8 in plugin_call_list (pb=0x7f2cb1d41ae0, operation=561, list=<optimized out="">) at ldap/servers/slapd/plugin.c:1451 #31 plugin_call_plugins (pb=pb@entry=0x7f2cb1d41ae0, whichfunction=whichfunction@entry=561) at ldap/servers/slapd/plugin.c:413 #32 0x00007f2cdcc8c89e in ldbm_back_modify (pb=<optimized out="">) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:823 #33 0x00007f2ce83a6601 in op_shared_modify (pb=pb@entry=0x7f2cb1d41ae0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1081 #34 0x00007f2ce83a7953 in do_modify (pb=pb@entry=0x7f2cb1d41ae0) at ldap/servers/slapd/modify.c:419 #35 0x00007f2ce887cdd1 in connection_dispatch_operation (pb=0x7f2cb1d41ae0, op=0x7f2cea018df0, conn=0x7f2cbdd60950) at ldap/servers/slapd/connection.c:660 #36 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #37 0x00007f2ce678be1b in _pt_root () from /lib64/libnspr4.so #38 0x00007f2ce612bf33 in start_thread () from /lib64/libpthread.so.0 #39 0x00007f2ce5e59ded in clone () from /lib64/libc.so.6 }}} This thread is doing a MOD {{{ [30/Apr/2014:07:18:59 -0400] conn=5 op=59 MOD dn="ou=ca,ou=requests,o=ipaca" }}} but deeper in the stack it ends into an internal search {{{ (gdb) frame 10 #10 0x00007f2cdcc5aa51 in find_entry_internal_dn (flags=0, txn=0x7f2cb1d34560, lock=0, sdn=0x7f2cea121e50, be=0x7f2ce98f65f0, pb=0x7f2cea0daa60) at ldap/servers/slapd/back-ldbm/findentry.c:130 130 (e = dn2entry_ext( be, sdn, txn, flags & TOMBSTONE_INCLUDED, &err )) (gdb) print *sdn $7 = {flag = 15 '\017', udn = 0x7f2cea14a2e0 "cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", dn = 0x7f2cea121ed0 "cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", ndn = 0x7f2ce9f141f0 "cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", ndn_len = 69} }}} An interesting point is that the entry "cn=group1" was a tombstone and a replicated update modify it. {{{ [30/Apr/2014:07:18:13 -0400] conn=171 op=5 DEL dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" ... [30/Apr/2014:07:18:13 -0400] conn=171 op=5 RESULT err=0 tag=107 nentries=0 etime=0 csn=5360dbf7000f00060000 ... [30/Apr/2014:07:18:13 -0400] conn=23 op=85 MOD dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" ... [30/Apr/2014:07:18:13 -0400] conn=23 op=85 RESULT err=0 tag=103 nentries=0 etime=0 csn=5360dbf7001100040000 }}} The purpose of this ticket (47787) is to allow successful update on a tombstone if the update comes from replication. A good point is that we see the update successful.. a bad point is that we have this hang (sigh !) Possibly fix for 47787 introduces this hang... or just reveal it. * At the time of the hang, some error msg about DEADLOCK {{{ [30/Apr/2014:07:14:13 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [30/Apr/2014:07:14:13 -0400] - Listening on All Interfaces port 636 for LDAPS requests [30/Apr/2014:07:14:13 -0400] - Listening on /var/run/slapd-IDM-LAB-BOS-REDHAT-COM.socket for LDAPI requests [30/Apr/2014:07:18:59 -0400] - database error -30993 [30/Apr/2014:07:18:59 -0400] - database error -30993 }}} (attaching the backstack, DB env dump, logs)

The thread 8 is looping in _entryrdn_get_elem, where it continuously gets DB_DEADLOCK from each attempt to cursor->c_get. The key->data = "dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"

The hang is looking as a separated issue. I created https://fedorahosted.org/389/ticket/47797 for it.

The hang being not systematic, I was able to test successfully the fix with freeipa unit tests.

{{{

Ran 1744 tests in 794.788s

FAILED (SKIP=64, errors=11, failures=1)

}}}

During that run, the fix works as it was able to apply successfully a MOD on a tombstone (if coming from replication) and replication continue without problem.

{{{
[06/May/2014:06:36:05 -0400] conn=175 fd=97 slot=97 connection from 10.16.78.61 to 10.16.78.61
...
[06/May/2014:06:36:06 -0400] conn=175 op=5 DEL dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
...
[06/May/2014:06:36:06 -0400] conn=27 op=88 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[06/May/2014:06:36:06 -0400] conn=27 op=88 RESULT err=0 tag=120 nentries=0 etime=0
[06/May/2014:06:36:06 -0400] conn=27 op=89 MOD dn="cn=ipausers,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
...
[06/May/2014:06:36:06 -0400] conn=175 op=5 RESULT err=0 tag=107 nentries=0 etime=0 csn=5368bb1b000200090000
[06/May/2014:06:36:06 -0400] conn=27 op=89 RESULT err=0 tag=103 nentries=0 etime=0 csn=5368bb1a000200040000
[06/May/2014:06:36:06 -0400] conn=27 op=90 MOD dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
[06/May/2014:06:36:06 -0400] conn=27 op=90 RESULT err=0 tag=103 nentries=0 etime=0 csn=5368bb1a000300040000
[06/May/2014:06:36:06 -0400] conn=27 op=91 MOD dn="cn=ipausers,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
[06/May/2014:06:36:06 -0400] conn=27 op=91 RESULT err=0 tag=103 nentries=0 etime=0 csn=5368bb1a000400040000

}}}

'''Push to master'''

git merge ticketfreeipa_4279

Updating 24d44ba..804d923
Fast-forward
dirsrvtests/tickets/ticket47787_test.py | 630 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ldap/servers/slapd/back-ldbm/ldbm_modify.c | 2 +-
2 files changed, 631 insertions(+), 1 deletion(-)
create mode 100644 dirsrvtests/tickets/ticket47787_test.py

git push origin master

Counting objects: 18, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (10/10), 6.17 KiB, done.
Total 10 (delta 6), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
24d44ba..804d923 master -> master

commit 804d923
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Fri Apr 25 15:07:08 2014 +0200

git cherry-pick 804d923
[389-ds-base-1.3.2 c13feee] Ticket 47787 - A replicated MOD fails (Unwilling to perform) if it targets a tombstone
2 files changed, 631 insertions(+), 1 deletion(-)
create mode 100644 dirsrvtests/tickets/ticket47787_test.py

git push origin '''389-ds-base-1.3.2'''

Counting objects: 18, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (10/10), 6.17 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
86b34ca..c13feee 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

git cherry-pick 804d923
[389-ds-base-1.3.1 1a9e1d8] Ticket 47787 - A replicated MOD fails (Unwilling to perform) if it targets a tombstone
2 files changed, 631 insertions(+), 1 deletion(-)
create mode 100644 dirsrvtests/tickets/ticket47787_test.py

git push origin '''389-ds-base-1.3.1'''

Counting objects: 16, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (10/10), 6.12 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
efa23ce..1a9e1d8 389-ds-base-1.3.1 -> 389-ds-base-1.3.1

git push origin '''389-ds-base-1.2.11'''

Counting objects: 16, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (10/10), 6.13 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
ca407f8..0ee8496 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

git push origin master

Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 815 bytes, done.
Total 5 (delta 3), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
3b5f3fa..52e57de master -> master

Metadata Update from @tbordaz:
- Issue assigned to tbordaz
- Issue set to the milestone: 1.2.11.30

2 years ago

Login to comment on this ticket.

Metadata