#47787 A replicated MOD fails (Unwilling to perform) if it targets a tombstone
Closed: wontfix None Opened 6 years ago by tbordaz.

The test case is the following:
1- Create M1-M2
2- Create an entry
3- Disable replication M1<-->M2
4- Delete the entry on M1
5- Modify the entry on M2
6- Enable replication

The MOD (5) fails when replicated on M1, so the test entry differs on M1 and M2.
This is likely a side effect of Ticket 47396.

Test case attached


* testing this fix with freeIPA unit tests (see https://fedorahosted.org/freeipa/ticket/4279#comment:6 for freeIPA tests) * those tests are immediately failing because of the need of 47721 so I need to cumulate the fixes {{{ 85106f0 Ticket 47721 - Schema Replication Issue (follow up) 96ab39f Ticket 47721 - Schema Replication Issue (follow up + cleanup) 6ebea73 Ticket 47721 - Schema Replication Issue 30f47c5 Ticket 47676 : (cont.) Replication of the schema fails 'master branch' -> 1.2.11 or 1.3.1 9558f55 Ticket 47676 : Replication of the schema fails 'master branch' -> 1.2.11 or 1.3.1 0ba131b Ticket 47541 - Fix Jenkins errors adce8c6 Ticket 47541 - Replication of the schema may overwrite consumer 'attributetypes' even if consumer definition 0d344e3 Bump version 1.3.2.17 4799be9 Ticket 47787: A replicated MOD fails (Unwilling to perform) if it targets a tombstone }}} * to do the test I backported the fixes in 1.3.2 * I hit a hang the first time I ran the test. I do not know if the hang is systematic or not. With the help of Ludwig (thanks for your tremendous help), here are the detail of the hang: The Thread 33 is waiting on the lock 115 because the page where the lock is implemented is hold by transaction 800058ad {{{ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by lockers: Locker Mode Count Status ----------------- Object --------------- ... 115 dd=122 locks held 0 write locks 0 pid/thread 12326/139830151333632 flags 0 priority 100 115 READ 1 WAIT changelog/id2entry.db page 2 ... 120 dd=111 locks held 0 write locks 0 pid/thread 12326/139830151333632 flags 0 priority 100 121 dd=110 locks held 0 write locks 0 pid/thread 12326/139830151333632 flags 0 priority 100 ... 149 dd=70 locks held 1 write locks 0 pid/thread 12326/139830151333632 flags 10 priority 100 149 READ 1 HELD userRoot/numsubordinates.db handle 0 ... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- ... 800058ad WRITE 1 HELD changelog/id2entry.db page 2 115 READ 1 WAIT changelog/id2entry.db page 2 ... 800058ad dd= 1 locks held 15 write locks 11 pid/thread 12326/139829938759424 flags 0 priority 100 800058ad READ 2 HELD userRoot/member.db page 23 800058ad WRITE 1 HELD changelog/numsubordinates.db page 1 800058ad WRITE 1 HELD changelog/id2entry.db page 2 800058ad WRITE 1 HELD changelog/entryrdn.db page 33 800058ad WRITE 2 HELD changelog/entryrdn.db page 35 800058ad WRITE 1 HELD changelog/ancestorid.db page 1 800058ad WRITE 1 HELD changelog/entryusn.db page 4 800058ad WRITE 1 HELD changelog/parentid.db page 1 800058ad WRITE 1 HELD changelog/nsuniqueid.db page 10 800058ad WRITE 1 HELD changelog/changenumber.db page 4 800058ad WRITE 3 HELD changelog/objectclass.db page 1 800058ad WRITE 1 HELD changelog/id2entry.db page 147 800058ad READ 2 HELD changelog/entryrdn.db page 35 800058ad READ 3 HELD changelog/entryrdn.db page 33 800058ad READ 1 HELD changelog/nsuniqueid.db page 10 }}} The transaction 800058ad is triggered by thread 8 {{{ Thread 8 (Thread 0x7f2cb1d42700 (LWP 12366)): #0 0x00007f2ce0025c7d in __db_tas_mutex_unlock () from /lib64/libdb-5.3.so #1 0x00007f2ce00e7f06 in __dbc_close () from /lib64/libdb-5.3.so #2 0x00007f2ce00e8c98 in __dbc_cleanup () from /lib64/libdb-5.3.so #3 0x00007f2ce00e8f5a in __dbc_iget () from /lib64/libdb-5.3.so #4 0x00007f2ce00f80b4 in __dbc_get_pp () from /lib64/libdb-5.3.so #5 0x00007f2cdcc811be in _entryrdn_get_elem (cursor=cursor@entry=0x7f2ce9c120a0, key=key@entry=0x7f2cb1d321d0, data=data@entry=0x7f2cb1d32200, comp_key=0x7f2ce9f14240 "dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", elem=elem@entry=0x7f2cb1d342e8) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:1783 #6 0x00007f2cdcc837e9 in _entryrdn_index_read (be=be@entry=0x7f2ce98f65f0, cursor=0x7f2ce9c120a0, srdn=srdn@entry=0x7f2cb1d342f0, elem=elem@entry=0x7f2cb1d342e8, parentelem=parentelem@entry=0x0, childelems=childelems@entry=0x0, flags=flags@entry=0, db_txn=0x7f2cea0d7b10) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:3158 #7 0x00007f2cdcc856b2 in entryrdn_index_read_ext (be=be@entry=0x7f2ce98f65f0, sdn=sdn@entry=0x7f2cea121e50, id=id@entry=0x7f2cb1d34384, flags=flags@entry=0, txn=txn@entry=0x7f2cb1d34560) at ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:439 #8 0x00007f2cdcc57d79 in dn2entry_ext (be=be@entry=0x7f2ce98f65f0, sdn=sdn@entry=0x7f2cea121e50, txn=txn@entry=0x7f2cb1d34560, flags=flags@entry=0, err=err@entry=0x7f2cb1d3443c) at ldap/servers/slapd/back-ldbm/dn2entry.c:92 #9 0x00007f2cdcc5aa51 in find_entry_internal_dn (flags=0, txn=0x7f2cb1d34560, lock=0, sdn=0x7f2cea121e50, be=0x7f2ce98f65f0, pb=0x7f2cea0daa60) at ldap/servers/slapd/back-ldbm/findentry.c:130 #10 find_entry_internal (pb=pb@entry=0x7f2cea0daa60, be=0x7f2ce98f65f0, lock=lock@entry=0, txn=txn@entry=0x7f2cb1d34560, flags=flags@entry=0, addr=<optimized out>, addr=<optimized out>) at ldap/servers/slapd/back-ldbm/findentry.c:293 #11 0x00007f2cdcc5ad3e in find_entry (pb=pb@entry=0x7f2cea0daa60, be=<optimized out>, addr=<optimized out>, txn=txn@entry=0x7f2cb1d34560) at ldap/servers/slapd/back-ldbm/findentry.c:313 #12 0x00007f2cdcc92183 in ldbm_back_search (pb=0x7f2cea0daa60) at ldap/servers/slapd/back-ldbm/ldbm_search.c:592 #13 0x00007f2ce83ad040 in op_shared_search (pb=pb@entry=0x7f2cea0daa60, send_result=send_result@entry=1) at ldap/servers/slapd/opshared.c:803 #14 0x00007f2ce83ba97e in search_internal_callback_pb (pb=0x7f2cea0daa60, callback_data=<optimized out>, prc=0x0, psec=0x7f2cdaedd080 <backend_shr_note_entry_sdn_cb>, prec=0x0) at ldap/servers/slapd/plugin_internal_op.c:812 #15 0x00007f2cdaedec01 in backend_shr_update_references_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #16 0x00007f2cdaeec17f in map_data_foreach_map () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #17 0x00007f2cdaedca2b in backend_shr_update_references () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #18 0x00007f2cdaeddb86 in backend_shr_add_cb.part.13 () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #19 0x00007f2cdaeddce1 in backend_shr_betxn_post_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #20 0x00007f2ce83b6515 in plugin_call_func (list=0x7f2ce9adc0c0, operation=operation@entry=560, pb=pb@entry=0x7f2cea0ae700, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1489 #21 0x00007f2ce83b66c8 in plugin_call_list (pb=0x7f2cea0ae700, operation=560, list=<optimized out>) at ldap/servers/slapd/plugin.c:1451 #22 plugin_call_plugins (pb=pb@entry=0x7f2cea0ae700, whichfunction=whichfunction@entry=560) at ldap/servers/slapd/plugin.c:413 #23 0x00007f2cdcc73b10 in ldbm_back_add (pb=0x7f2cea0ae700) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1035 #24 0x00007f2ce836157a in op_shared_add (pb=pb@entry=0x7f2cea0ae700) at ldap/servers/slapd/add.c:735 #25 0x00007f2ce8361dd3 in add_internal_pb (pb=pb@entry=0x7f2cea0ae700) at ldap/servers/slapd/add.c:434 #26 0x00007f2ce8362aa3 in slapi_add_internal_pb (pb=pb@entry=0x7f2cea0ae700) at ldap/servers/slapd/add.c:356 #27 0x00007f2cdb50bc81 in write_replog_db (newsuperior=0x0, modrdn_mods=0x0, newrdn=0x0, log_e=0x0, curtime=1398856739, flag=0, log_m=0x7f2cea0d3b80, dn=0x7f2cea0de7a0 "ou=ca,ou=requests,o=ipaca", optype=<optimized out>, pb=<optimized out>) at ldap/servers/plugins/retrocl/retrocl_po.c:371 #28 retrocl_postob (pb=<optimized out>, optype=<optimized out>) at ldap/servers/plugins/retrocl/retrocl_po.c:668 #29 0x00007f2ce83b6515 in plugin_call_func (list=0x7f2ce9ac90e0, operation=operation@entry=561, pb=pb@entry=0x7f2cb1d41ae0, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1489 #30 0x00007f2ce83b66c8 in plugin_call_list (pb=0x7f2cb1d41ae0, operation=561, list=<optimized out>) at ldap/servers/slapd/plugin.c:1451 #31 plugin_call_plugins (pb=pb@entry=0x7f2cb1d41ae0, whichfunction=whichfunction@entry=561) at ldap/servers/slapd/plugin.c:413 #32 0x00007f2cdcc8c89e in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:823 #33 0x00007f2ce83a6601 in op_shared_modify (pb=pb@entry=0x7f2cb1d41ae0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1081 #34 0x00007f2ce83a7953 in do_modify (pb=pb@entry=0x7f2cb1d41ae0) at ldap/servers/slapd/modify.c:419 #35 0x00007f2ce887cdd1 in connection_dispatch_operation (pb=0x7f2cb1d41ae0, op=0x7f2cea018df0, conn=0x7f2cbdd60950) at ldap/servers/slapd/connection.c:660 #36 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #37 0x00007f2ce678be1b in _pt_root () from /lib64/libnspr4.so #38 0x00007f2ce612bf33 in start_thread () from /lib64/libpthread.so.0 #39 0x00007f2ce5e59ded in clone () from /lib64/libc.so.6 }}} This thread is doing a MOD {{{ [30/Apr/2014:07:18:59 -0400] conn=5 op=59 MOD dn="ou=ca,ou=requests,o=ipaca" }}} but deeper in the stack it ends into an internal search {{{ (gdb) frame 10 #10 0x00007f2cdcc5aa51 in find_entry_internal_dn (flags=0, txn=0x7f2cb1d34560, lock=0, sdn=0x7f2cea121e50, be=0x7f2ce98f65f0, pb=0x7f2cea0daa60) at ldap/servers/slapd/back-ldbm/findentry.c:130 130 (e = dn2entry_ext( be, sdn, txn, flags & TOMBSTONE_INCLUDED, &err )) (gdb) print *sdn $7 = {flag = 15 '\017', udn = 0x7f2cea14a2e0 "cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", dn = 0x7f2cea121ed0 "cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", ndn = 0x7f2ce9f141f0 "cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com", ndn_len = 69} }}} An interesting point is that the entry "cn=group1" was a tombstone and a replicated update modify it. {{{ [30/Apr/2014:07:18:13 -0400] conn=171 op=5 DEL dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" ... [30/Apr/2014:07:18:13 -0400] conn=171 op=5 RESULT err=0 tag=107 nentries=0 etime=0 csn=5360dbf7000f00060000 ... [30/Apr/2014:07:18:13 -0400] conn=23 op=85 MOD dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" ... [30/Apr/2014:07:18:13 -0400] conn=23 op=85 RESULT err=0 tag=103 nentries=0 etime=0 csn=5360dbf7001100040000 }}} The purpose of this ticket (47787) is to allow successful update on a tombstone if the update comes from replication. A good point is that we see the update successful.. a bad point is that we have this hang (sigh !) Possibly fix for 47787 introduces this hang... or just reveal it. * At the time of the hang, some error msg about DEADLOCK {{{ [30/Apr/2014:07:14:13 -0400] - slapd started. Listening on All Interfaces port 389 for LDAP requests [30/Apr/2014:07:14:13 -0400] - Listening on All Interfaces port 636 for LDAPS requests [30/Apr/2014:07:14:13 -0400] - Listening on /var/run/slapd-IDM-LAB-BOS-REDHAT-COM.socket for LDAPI requests [30/Apr/2014:07:18:59 -0400] - database error -30993 [30/Apr/2014:07:18:59 -0400] - database error -30993 }}} (attaching the backstack, DB env dump, logs)

The thread 8 is looping in _entryrdn_get_elem, where it continuously gets DB_DEADLOCK from each attempt to cursor->c_get. The key->data = "dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"

The hang is looking as a separated issue. I created https://fedorahosted.org/389/ticket/47797 for it.

The hang being not systematic, I was able to test successfully the fix with freeipa unit tests.

{{{

Ran 1744 tests in 794.788s

FAILED (SKIP=64, errors=11, failures=1)

}}}

During that run, the fix works as it was able to apply successfully a MOD on a tombstone (if coming from replication) and replication continue without problem.

{{{
[06/May/2014:06:36:05 -0400] conn=175 fd=97 slot=97 connection from 10.16.78.61 to 10.16.78.61
...
[06/May/2014:06:36:06 -0400] conn=175 op=5 DEL dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
...
[06/May/2014:06:36:06 -0400] conn=27 op=88 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[06/May/2014:06:36:06 -0400] conn=27 op=88 RESULT err=0 tag=120 nentries=0 etime=0
[06/May/2014:06:36:06 -0400] conn=27 op=89 MOD dn="cn=ipausers,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
...
[06/May/2014:06:36:06 -0400] conn=175 op=5 RESULT err=0 tag=107 nentries=0 etime=0 csn=5368bb1b000200090000
[06/May/2014:06:36:06 -0400] conn=27 op=89 RESULT err=0 tag=103 nentries=0 etime=0 csn=5368bb1a000200040000
[06/May/2014:06:36:06 -0400] conn=27 op=90 MOD dn="cn=group1,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
[06/May/2014:06:36:06 -0400] conn=27 op=90 RESULT err=0 tag=103 nentries=0 etime=0 csn=5368bb1a000300040000
[06/May/2014:06:36:06 -0400] conn=27 op=91 MOD dn="cn=ipausers,cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
[06/May/2014:06:36:06 -0400] conn=27 op=91 RESULT err=0 tag=103 nentries=0 etime=0 csn=5368bb1a000400040000

}}}

'''Push to master'''

git merge ticketfreeipa_4279

Updating 24d44ba..804d923
Fast-forward
dirsrvtests/tickets/ticket47787_test.py | 630 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ldap/servers/slapd/back-ldbm/ldbm_modify.c | 2 +-
2 files changed, 631 insertions(+), 1 deletion(-)
create mode 100644 dirsrvtests/tickets/ticket47787_test.py

git push origin master

Counting objects: 18, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (10/10), 6.17 KiB, done.
Total 10 (delta 6), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
24d44ba..804d923 master -> master

commit 804d923
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Fri Apr 25 15:07:08 2014 +0200

git cherry-pick 804d923
[389-ds-base-1.3.2 c13feee] Ticket 47787 - A replicated MOD fails (Unwilling to perform) if it targets a tombstone
2 files changed, 631 insertions(+), 1 deletion(-)
create mode 100644 dirsrvtests/tickets/ticket47787_test.py

git push origin '''389-ds-base-1.3.2'''

Counting objects: 18, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (10/10), 6.17 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
86b34ca..c13feee 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

git cherry-pick 804d923
[389-ds-base-1.3.1 1a9e1d8] Ticket 47787 - A replicated MOD fails (Unwilling to perform) if it targets a tombstone
2 files changed, 631 insertions(+), 1 deletion(-)
create mode 100644 dirsrvtests/tickets/ticket47787_test.py

git push origin '''389-ds-base-1.3.1'''

Counting objects: 16, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (10/10), 6.12 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
efa23ce..1a9e1d8 389-ds-base-1.3.1 -> 389-ds-base-1.3.1

git push origin '''389-ds-base-1.2.11'''

Counting objects: 16, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (10/10), 6.13 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
ca407f8..0ee8496 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

git push origin master

Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 815 bytes, done.
Total 5 (delta 3), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
3b5f3fa..52e57de master -> master

Metadata Update from @tbordaz:
- Issue assigned to tbordaz
- Issue set to the milestone: 1.2.11.30

3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1118

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

2 months ago

Login to comment on this ticket.

Metadata