#47764 Problem with deletion while replicated
Closed: wontfix None Opened 7 years ago by chatfield.

Hi, this ticket looks to be very similar to what's being seen in https://fedorahosted.org/389/ticket/47696 , which is still open at this time. Perhaps the logs and traces here can help that ticket. If this is the wrong process for creating associated tickets, then I apologise.

We have a pair of 389 DS masters replicating with each other. Very often while deleting a subtree containing around 10000 entries the ldap client will receive error 66 (LDAP_NOT_ALLOWED_ON_NONLEAF) partway through. In the errors log we see messages like:

[02/Apr/2014:11:53:08 +0100] entryrdn-index - _entryrdn_delete_key: Failed to remove cn=fax; has children
[02/Apr/2014:11:53:08 +0100] - database index operation failed BAD 1031, err=-1 Unknown error: -1

Afterwards, looking at the parent entry with an ldap browser we see no (apparent) child entries. If the DS is restarted the entry (in this case cn=Fax,..) reappears. If you try to delete that entry you get an Operations error. If you then try to delete it a second time it is successful.

I've managed to repeat this reliably in my test environment; however this contains sensitive data. Also we have added items to the schema. When the tree is imported back into the DS the delete will fail at the same entry. Narrowing it down it appears that just deleting that single entry will trigger the error.

I managed to capture the moment of deletion with full error logging enabled; this is attached. The DNs of interest in this log are tpUuid=a219426d-0add-482c-a405-2cfd1e137a9b,cn=Fax,gcUID=00036448,ou=Subscribers,gcUID=00000067,o=teamphone.com and its parent.

What also might be useful is that I have taken a dbscan trace between each delete operation (on another item with the same behaviour). The start (clean) state is:

..
id 11001
    rdn: cn=Fax
    nsUniqueId: e2d0ebb0-ba8511e3-b8dab01a-d8064075
    cn: Fax
    gcCOS: unrestricted
    gcLanguage: 1
    gcProgramNbr: Fax
    gcViewerClass: FaxRenderer
    objectClass: gcService
    objectClass: top
    objectClass: tpParameters
    owner: gcUID=0001ab1c,ou=Subscribers,gcUID=00000067,o=Teamphone.com
    tpNumberParameter0: 0
    tpNumberParameter1: 0
    tpNumberParameter10: 0
    tpNumberParameter11: 0
    tpNumberParameter12: 0
    tpNumberParameter13: 0
    tpNumberParameter14: 0
    tpNumberParameter15: 0
    tpNumberParameter2: 0
    tpNumberParameter3: 0
    tpNumberParameter4: 0
    tpNumberParameter5: 0
    tpNumberParameter6: 0
    tpNumberParameter7: 0
    tpNumberParameter8: 0
    tpNumberParameter9: 0
    creatorsName: cn=directory manager
    modifiersName: cn=directory manager
    createTimestamp: 20140402164347Z
    modifyTimestamp: 20140402164347Z
    parentid: 11000
    entryid: 11001
    numSubordinates: 1
..
id 11863
    rdn: tpUuid=89510da7-b042-47de-8b83-13ad878220fa
    nsUniqueId: e2d0ef0e-ba8511e3-b8dab01a-d8064075
    objectClass: gcRouting
    objectClass: top
    telephoneNumber: +442031625004
    tpTargetService: cn=Fax,gcUID=0001ab1c,ou=Subscribers,gcUID=00000067,o=Teampho
     ne.com
    tpUuid: 89510da7-b042-47de-8b83-13ad878220fa
    creatorsName: cn=directory manager
    modifiersName: cn=directory manager
    createTimestamp: 20140402164355Z
    modifyTimestamp: 20140402164355Z
    parentid: 11001
    entryid: 11863
..

After tpUuid=89510da7-b042-47de-8b83-13ad878220fa,cn=Fax,gcUID=0001ab1c,ou=Subscribers,gcUID=00000067,o=Teamphone.com is deleted the dbscan trace difference is:

298383c298383
<       numSubordinates: 1
---
>       tombstoneNumSubordinates: 1
326932c326932,326933
<       rdn: tpUuid=89510da7-b042-47de-8b83-13ad878220fa
---
>       rdn: nsuniqueid=e2d0ef0e-ba8511e3-b8dab01a-d8064075,tpUuid=89510da7-b042-47de-
>        8b83-13ad878220fa
326935a326937
>       objectClass;vucsn-533d622b000000010000: nsTombstone
326945a326948,326950
>       nsParentUniqueId: e2d0ebb0-ba8511e3-b8dab01a-d8064075
>       nscpEntryDN: tpuuid=89510da7-b042-47de-8b83-13ad878220fa,cn=fax,gcuid=0001ab1c
>        ,ou=subscribers,gcuid=00000067,o=teamphone.com
374010c374015,374016
<       nsds50ruv: {replica 1 ldap://chrisds.ds.com:489}
---
>       nsds50ruv: {replica 1 ldap://chrisds.ds.com:489} 533d622b000000010000 533d622b
>        000000010000
374011a374018
>       nsruvReplicaLastModified: {replica 1 ldap://chrisds.ds.com:489} 533d6228

After cn=Fax,gcUID=0001ab1c,ou=Subscribers,gcUID=00000067,o=Teamphone.com is deleted (as per the 2-attempt, above) the trace difference is:

298347c298347,298348
<       numSubordinates: 5
---
>       numSubordinates: 4
>       tombstoneNumSubordinates: 1
298350c298351,298352
<       rdn: cn=Fax
---
>       rdn: nsuniqueid=e2d0ebb0-ba8511e3-b8dab01a-d8064075,nsuniqueid=e2d0ebb0-ba8511
>        e3-b8dab01a-d8064075
298359a298362
>       objectClass;vucsn-533d6274000000010000: nsTombstone
298383a298387,298389
>       nsParentUniqueId: e2d0ebaf-ba8511e3-b8dab01a-d8064075
>       nscpEntryDN: nsuniqueid=e2d0ebb0-ba8511e3-b8dab01a-d8064075,cn=fax,gcuid=0001a
>        b1c,ou=subscribers,gcuid=00000067,o=teamphone.com
374015c374021
<       nsds50ruv: {replica 1 ldap://chrisds.ds.com:489} 533d622b000000010000 533d622b
---
>       nsds50ruv: {replica 1 ldap://chrisds.ds.com:489} 533d622b000000010000 533d6274
374018c374024
<       nsruvReplicaLastModified: {replica 1 ldap://chrisds.ds.com:489} 533d6228
---
>       nsruvReplicaLastModified: {replica 1 ldap://chrisds.ds.com:489} 533d6271

Note how the rdn is set to twice what the original self nsUniqueId was (rather than the parent nsUniqueId).

The version is 389-ds-base-1.2.11.15-32.el6_5.x86_64 on Centos 6.5.


I've been doing some investigating on this, and I think I've found the area that might be causing the problem.

I think that if strlen('cn=fax') <= strlen('C11001') the problem occurs i.e. if the length of the normalised RDN is shorter than the RDN-CHILD-INDEX key.

Notice how the way that this index key is created is inconsistent in the size/ulen members:
{{{
keybuf = slapi_ch_smprintf("%c%u", RDN_INDEX_CHILD, targetid);
key.data = keybuf;
key.size = key.ulen = strlen(keybuf) + 1;
}}}
And
{{{
keybuf = slapi_ch_smprintf("%c%u", RDN_INDEX_CHILD, id);
key.data = (void *)keybuf;
key.size = key.ulen = strlen(nrdn) + 1;
}}}

If I apply this patch then that fixes my problem:
{{{
diff -Npru 389-ds-base-1.2.11.15.orig/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c 389-ds-base-1.2.11.15/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c
--- 389-ds-base-1.2.11.15.orig/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c 2014-04-08 12:00:40.006725972 +0100
+++ 389-ds-base-1.2.11.15/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c 2014-04-11 16:06:48.892476801 +0100
@@ -2616,7 +2616,7 @@ _entryrdn_insert_key(backend be,
/
E.g., C1 */
keybuf = slapi_ch_smprintf("%c%u", RDN_INDEX_CHILD, workid);
key.data = keybuf;
- key.size = key.ulen = strlen(keybuf) + 1;
+ key.size = key.ulen = strlen(nrdn) + 1;
key.flags = DB_DBT_USERMEM;

     tmpsrdn = srdn;

}}}
Note that this is not the full fix as this breaks ldif2db import - however this might be a helpful pointer to finding the real solution.

Looking at the latest version of 1.2.11, I only see "nrdn" being used for the key length in _entryrdn_delete_key() - not in _entryrdn_insert_key().

Anyway, using the strlen of nrdn is wrong. We should never use nrdn for the key length, but instead use the keybuf generated by slapi_ch_smprintf(): key.size = key.ulen = strlen(keybuf) + 1;

I can not reproduce this issue(I've been trying), but if you can, can you please try using "key.size = key.ulen = strlen(keybuf) + 1;" for all cases where we are building rdn child key? In my code base, 1.2.11, I only see it being used _entryrdn_delete_key(), but if its in other places in your code, then it should also be changed there as well. Please let me know if it helps - thanks.

I tried it with this patch, but it still failed - in fact it made the problem worse as now more elements in the tree exhibit the problem:
{{{
diff -Npru 389-ds-base-1.2.11.15.orig/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c 389-ds-base-1.2.11.15/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c
--- 389-ds-base-1.2.11.15.orig/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c 2014-04-08 12:00:40.006725972 +0100
+++ 389-ds-base-1.2.11.15/ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c 2014-04-11 17:47:34.947470699 +0100
@@ -2826,7 +2826,7 @@ _entryrdn_delete_key(backend be,
/
check if the target element has a child or not /
keybuf = slapi_ch_smprintf("%c%u", RDN_INDEX_CHILD, id);
key.data = (void
)keybuf;
- key.size = key.ulen = strlen(nrdn) + 1;
+ key.size = key.ulen = strlen(keybuf) + 1;
key.flags = DB_DBT_USERMEM;

 memset(&data, 0, sizeof(data));

}}}

I'm not familiar enough (yet) with the Berkeley DB API - I'm wondering whether one of the calls involving RDN_INDEX_CHILD keys is making modifications to the memory pointed to by {{{key.data}}}. I'll keep investigating.

In the original report, this is definitely a bug.
298350c298351,298352
< rdn: cn=Fax


  rdn: nsuniqueid=e2d0ebb0-ba8511e3-b8dab01a-d8064075,nsuniqueid=e2d0ebb0-ba8511
   e3-b8dab01a-d8064075

I should change the subject, but a dup of this ticket (tombstoned tombstone), which I'm working on now...
https://fedorahosted.org/389/ticket/47696

I ran this with Valgrind, in case it showed anything. Comparing between a working case and the failing case I get this trace when deleting the leaf entry:
{{{
==8782== Thread 44:
==8782== Conditional jump or move depends on uninitialised value(s)
==8782== at 0xC50DDA8: __log_putr (log_put.c:731)
==8782== by 0xC50EB88: __log_put (log_put.c:463)
==8782== by 0xC4C4E68: __db_addrem_log (db_auto.c:257)
==8782== by 0xC4D2147: __db_pitem (db_dup.c:130)
==8782== by 0xC42A701: __bam_iitem (bt_put.c:376)
==8782== by 0xC425237: __bamc_put (bt_cursor.c:1960)
==8782== by 0xC4CEA82: __dbc_put (db_cam.c:1520)
==8782== by 0xC4D3FC0: __dbc_put_pp (db_iface.c:2456)
==8782== by 0x100BB950: _entryrdn_put_data (ldbm_entryrdn.c:1948)
==8782== by 0x100C1B1A: _entryrdn_insert_key (ldbm_entryrdn.c:2102)
==8782== by 0x100C23A0: entryrdn_index_entry (ldbm_entryrdn.c:295)
==8782== by 0x100BA286: ldbm_back_delete (ldbm_delete.c:808)
==8782== by 0x4E7B522: ??? (in /usr/lib64/dirsrv/libslapd.so.0.0.0)
==8782== by 0x4E7B860: do_delete (in /usr/lib64/dirsrv/libslapd.so.0.0.0)
==8782== by 0x4141D3: ??? (in /usr/sbin/ns-slapd)
==8782== by 0x6A929A5: _pt_root (ptthread.c:204)
==8782== by 0x70CB9D0: start_thread (pthread_create.c:301)
==8782== by 0x73C9B6C: clone (clone.S:115)
}}}
I'm not sure whether the above is useful, but it led me to the thought in comment 1.

It seems to be in the area where the leaf is converted into a tombstone, and the child index for this tombstone is added to its parent. Perhaps in the case where the parent rdn is short it causes some sort of corruption of the index that eventually leads to the tombstoned tombstone issue when the parent (now itself a leaf) is deleted.

Replying to [comment:5 chatfield]:

I ran this with Valgrind, in case it showed anything. Comparing between a working case and the failing case I get this trace when deleting the leaf entry:
{{{
==8782== Thread 44:
==8782== Conditional jump or move depends on uninitialised value(s)
==8782== at 0xC50DDA8: __log_putr (log_put.c:731)
==8782== by 0xC50EB88: __log_put (log_put.c:463)
==8782== by 0xC4C4E68: __db_addrem_log (db_auto.c:257)
==8782== by 0xC4D2147: __db_pitem (db_dup.c:130)
==8782== by 0xC42A701: __bam_iitem (bt_put.c:376)
==8782== by 0xC425237: __bamc_put (bt_cursor.c:1960)
==8782== by 0xC4CEA82: __dbc_put (db_cam.c:1520)
==8782== by 0xC4D3FC0: __dbc_put_pp (db_iface.c:2456)
==8782== by 0x100BB950: _entryrdn_put_data (ldbm_entryrdn.c:1948)
==8782== by 0x100C1B1A: _entryrdn_insert_key (ldbm_entryrdn.c:2102)
==8782== by 0x100C23A0: entryrdn_index_entry (ldbm_entryrdn.c:295)
==8782== by 0x100BA286: ldbm_back_delete (ldbm_delete.c:808)
==8782== by 0x4E7B522: ??? (in /usr/lib64/dirsrv/libslapd.so.0.0.0)
==8782== by 0x4E7B860: do_delete (in /usr/lib64/dirsrv/libslapd.so.0.0.0)
==8782== by 0x4141D3: ??? (in /usr/sbin/ns-slapd)
==8782== by 0x6A929A5: _pt_root (ptthread.c:204)
==8782== by 0x70CB9D0: start_thread (pthread_create.c:301)
==8782== by 0x73C9B6C: clone (clone.S:115)
}}}
I'm not sure whether the above is useful, but it led me to the thought in comment 1.

This trace is benign. This is not the droid you're looking for. Unfortunately, valgrind does not have a high signal to noise ratio - there can be some noise like this that you have to ignore.

It seems to be in the area where the leaf is converted into a tombstone, and the child index for this tombstone is added to its parent. Perhaps in the case where the parent rdn is short it causes some sort of corruption of the index that eventually leads to the tombstoned tombstone issue when the parent (now itself a leaf) is deleted.

I've just checked the entryrdn db files (using dbscan -R) after deleting the original leaf entry (id 11863), and it appears that everything is being updated as it should.

Maybe this theory is why we're getting the tombstoned tombstone issue:
When the client deletes cn=Fax above, we get the "has children" error (current cause unknown) in the log, and the error code back to the client. Then the transaction gets aborted, so the disk view is correct. Having checked the client view at this point (by reloading the subtree), the cn=Fax entry is gone. However, if at this point the client hadn't refreshed they could attempt to delete the same entry. If you do this you get this in the errors log:
{{{
[14/Apr/2014:16:03:49 +0100] entryrdn-index - _entryrdn_delete_key: Failed to remove cn=fax; has children
[14/Apr/2014:16:03:49 +0100] - database index operation failed BAD 1031, err=-1 Unknown error: -1
[14/Apr/2014:16:03:53 +0100] entryrdn-index - _entryrdn_delete_key: Failed to remove nsuniqueid=e2d0ebb0-ba8511e3-b8dab01a-d8064075; has children
[14/Apr/2014:16:03:53 +0100] - database index operation failed BAD 1031, err=-1 Unknown error: -1
}}}
This suggests the cache/memory view wasn't reset fully when the previous transaction was aborted. It would seem the entry in the cache is the version that has partially been converted to a tombstone.
Note also that if the server is restarted after the first delete of cn=Fax the entry becomes visible to client searches again.

As an addition to comment 7, this cached entry of a partially converted tombstone is in a state where a delete can be attempted on it if the client knows the original dn. This means that it may be converted into a tombstone a second time, hence the double tombstone rdn.

Another theory, this time a guess at the cause of the "has children" error:
It looks like the change in comment 3 is a valid one. However by making that change, it has revealed something that might have been working "accidentally". Normally the length of the rdn is greater than the length of the index key. When the size/ulen fields were set to strlen(nrdn) where the data field was set to keybuf, the Berkeley db was probably trying to search for a child index whose key was actually "C12345\0xyz" rather than "C12345". This causes the return code of DB_NOTFOUND, which allows the delete to continue. However in the case above, the nrdn is the same length as the key, which means the child index does contain an entry, which causes the no-child-check to abort the transaction.

Ok, I think I have the steps to recreate:
* Set up a pair of servers, ldap1 and ldap2
* Set up replication between them
* Fill the tree so that there are just over 100 entries in the complete tree (i.e. we want the subsequent entry ids to be 3 digits long)
* Create an entry within the tree o=xy (it's important that its rdn is 4 characters long)
* Create an entry under this (it doesn't matter in this case what kind you create)
* from your ldap client delete the subtree starting at o=xy
* you should get the "has children" error
* theoretically if you delete again the o=xy entry you would get the double tombstone

If you'd like me to prepare and upload an example tree that demonstrates the the problem, just let me know. Hopefully that would be useful for you.

Replying to [comment:11 chatfield]:

If you'd like me to prepare and upload an example tree that demonstrates the the problem, just let me know. Hopefully that would be useful for you.

Yes, it'd help our debugging a lot. We've been working on the related issue #47750 etc., and I'd like to run your test against the debugged version. Could you please provide us the test ldif file(s) you are using?
. the tree so that there are just over 100 entries
. an entry within the tree o=xy
. Create an entry under this

Thank you, in advance.

Here is a simpler version of the file as described in the recreation steps. To recreate, follow this:
* Prepare a fresh 389DS with a base DN of o=a
* Enable changelog and (multi master) replication for o=a (you don't need a second DS, nor an agreement set up)
* Import the sample ldif and restart the DS
* Using a client tool, delete "ou=child,o=x,o=a"
* Using a client tool, delete "o=x,o=a"
You should see the second delete fail, and warnings in the errors log.

Thank you for the reproducer! I could duplicate the bug.

Bug description: When checking a child entry on a node, it only
checked the first position, which was normally "deleted" if there
were no more children. But in some cases, a tombstoned child was
placed there. If it occurred, even though there were no live child
any more, _entryrdn_delete_key returned "has children" and the delete
operation failed.

Fix description: This patch checks all the children of the to-be-
deleted node and if there is no child or all of them are tombstones,
it goes to the next process. Also, the fixed a typo reported by
chatfield (Thank you!!)

Thanks for the patch! I've tried it on my test box (Centos 6.5/EPEL with all the current updates), and applied the patch against the source rpm. I had to change one line to make it work for me:
{{{if (!slapi_is_special_rdn(childelem->rdn_elem_nrdn_rdn, RDN_IS_TOMBSTONE)) {}}}
becomes:
{{{if (NULL == strchr((char *)childelem->rdn_elem_nrdn_rdn, ',')) {}}}
Good news - it solves the case I'm seeing!

While playing around with this I saw a potential leak in Valgrind:
{{{
==26812== 60,394 bytes in 13,709 blocks are definitely lost in loss record 1,625 of 1,676
==26812== at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==26812== by 0x4E77BBA: slapi_ch_malloc (ch_malloc.c:155)
==26812== by 0x4E85A8F: slapi_entry_attr_get_charptr (entry.c:2730)
==26812== by 0x100B97E7: ldbm_back_delete (ldbm_delete.c:329)
==26812== by 0x4E7B522: op_shared_delete (delete.c:364)
==26812== by 0x4E7B860: do_delete (delete.c:128)
==26812== by 0x4141D3: connection_threadmain (connection.c:584)
==26812== by 0x6A929A5: _pt_root (ptthread.c:204)
==26812== by 0x70CB9D0: start_thread (pthread_create.c:301)
==26812== by 0x73C9B6C: clone (clone.S:115)
}}}
It looks like in the line in ldbm_delete.c:
{{{char *pid_str = slapi_entry_attr_get_charptr(e->ep_entry, LDBM_PARENTID_STR);}}}
the pid_str is not freed. Having said that, I don't think the patch causes this. Also, it might have already been fixed in the Git tree.

Anyway, great work!

Thank you for reviewing and trying the patch! Yes, this patch was created on top of this fix, in which slapi_is_special_rdn was added (please note that the both patches are waiting for reviews...)
https://fedorahosted.org/389/ticket/47750
As you see, the patch for #47750 is huge and invasive... I ran my tests with the 2 patches applied via valgrind and thought it's leak clean, but I might have missed it. Definitely, I'm going to take a look at the the code you pointed out.
Thanks!
--noriko

I had fixed the pid_str leak in a previous patch for 1.2.11. So this patch is good to go, ack.

No wonder I didn't see it. Thanks a lot, Mark!

Reviewed by chatfield and Mark (Thank you, both of you!!)

Pushed to master:
160cb3f..832253e master -> master
commit 832253e

Pushed to 389-ds-base-1.3.2:
3f933fe..86b34ca 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 86b34ca

Pushed to 389-ds-base-1.3.1:
68b52b7..efa23ce 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit efa23ced2a6e3de3389d9b801329066f511bc38c

Pushed to 389-ds-base-1.2.11:
9e5b333..ca407f8 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit ca407f8

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.11.30

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1096

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

11 months ago

Login to comment on this ticket.

Metadata