If a db deadlock error occurs during a MODRDN that operation is tried again, but on the second pass things go wrong on that same operation.
So do a modrdn and move an entry so a new superior. Then try and move it back to the original subtree. Note - I did instrument the code to always trigger a single deadlock error. When I try to move it back to the original substree/superior I get an error 68!
ldapsearch shows the entry was not moved as expected since we got an error:
ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=accoun* * + dn: cn=Accounting Managers,ou=MyOU,ou=Groups,dc=example,dc=com objectClass: top objectClass: groupOfUniqueNames cn: Accounting Managers ou: groups description: People who can manage accounting entries uniqueMember: cn=dm nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f creatorsName: modifiersName: cn=dm createTimestamp: 20180323123313Z modifyTimestamp: 20180323130251Z entryid: 6 parentid: 10 entrydn: cn=accounting managers,ou=myou,ou=groups,dc=example,dc=com
If I restart the server:
The entry is now in the original subtree (even though we got an error that it failed)
ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=accoun* * +
dn: cn=Accounting Managers,ou=Groups,dc=example,dc=com objectClass: top objectClass: groupOfUniqueNames cn: Accounting Managers ...
Performing ldapsearch using various scopes also gives inconsistent results for this entry:
[root@localhost BUILD]# ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s one -xLLL cn=account*
---> no results
[root@localhost BUILD]# ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=account* dn: cn=Accounting Managers,ou=Groups,dc=example,dc=com objectClass: top objectClass: groupOfUniqueNames cn: Accounting Managers ou: groups description: People who can manage accounting entries uniqueMember: cn=dm nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f creatorsName: modifiersName: cn=dm createTimestamp: 20180323123313Z modifyTimestamp: 20180323130251Z entryid: 6 parentid: 10 entrydn: cn=accounting managers,ou=groups,dc=example,dc=com
dbscan shows that the entry's parentid is still pointing to the old subtree:
rdn: cn=Accounting Managers objectClass: top objectClass: groupOfUniqueNames cn: Accounting Managers ou: groups description: People who can manage accounting entries uniqueMember: cn=dm nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f creatorsName: modifiersName: cn=dm createTimestamp: 20180323123313Z modifyTimestamp: 20180323130251Z entryid: 6 parentid: 10
parentid should be 3 (not 10) in this case. Perhaps that is messing up the scoped search?
If I export and reimport the ldif, the parentid is adjusted to the correct value of 3, and the entry is found under the original subtree.
So we are seeing db & entry cache corruption when a db deadlock occurs on modrdn operations. That being said the way I am forcing the db deadlock condition might be flawed. So this needs more investigation...
This is how I'm triggering the db deadlock in ldbm_modrdn.c
line: ~1070 } /* Push out the db modifications from the new parent entry */ else /* retval == 0 */ { if (MARK == 0 ){ slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Retry txn.....\n"); MARK = 1; continue; } retval = modify_update_all(be, pb, &newparent_modify_context, &txn); slapi_log_err(SLAPI_LOG_BACKLDBM, "ldbm_back_modrdn", "conn=%lu op=%d modify_update_all: old_entry=0x%p, new_entry=0x%p, rc=%d\n", conn_id, op_id, parent_modify_context.old_entry, parent_modify_context.new_entry, retval); if (DB_LOCK_DEADLOCK == retval) { /* Retry txn */ continue; }
Metadata Update from @mreynolds: - Custom field component adjusted to None - Custom field origin adjusted to None - Custom field reviewstatus adjusted to None - Custom field type adjusted to None - Custom field version adjusted to None
The error 68 that is caused by moving an entry back to its oriignal superior is coming from:
ldbm_modrdn.c: ~480
/* Check that an entry with the same DN doesn't already exist. */ { Slapi_Entry *entry; slapi_pblock_get(pb, SLAPI_MODRDN_EXISTING_ENTRY, &entry); if ((entry != NULL) && /* allow modrdn even if the src dn and dest dn are identical */ (0 != slapi_sdn_compare((const Slapi_DN *)&dn_newdn, (const Slapi_DN *)sdn))) { ldap_result_code = LDAP_ALREADY_EXISTS; slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Already exists 1 (%p) new dn (%s) old dn (%s)\n", entry, slapi_sdn_get_dn((const Slapi_DN *)&dn_newdn), slapi_sdn_get_dn((const Slapi_DN *)sdn) ); goto error_return; } }
"entry" is not NULL, and the two DNs are different --> this triggers ALREADY_EXISTS. During a working MODRDN the existing entry is NULL.
This is how I'm triggering the db deadlock in ldbm_modrdn.c line: ~1070 } / Push out the db modifications from the new parent entry / else / retval == 0 / { if (MARK == 0 ){ slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Retry txn.....\n"); MARK = 1; continue; } retval = modify_update_all(be, pb, &newparent_modify_context, &txn); slapi_log_err(SLAPI_LOG_BACKLDBM, "ldbm_back_modrdn", "conn=%lu op=%d modify_update_all: old_entry=0x%p, new_entry=0x%p, rc=%d\n", conn_id, op_id, parent_modify_context.old_entry, parent_modify_context.new_entry, retval); if (DB_LOCK_DEADLOCK == retval) { / Retry txn / continue; }
This is how I'm triggering the db deadlock in ldbm_modrdn.c line: ~1070 } / Push out the db modifications from the new parent entry / else / retval == 0 / {
if (MARK == 0 ){ slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Retry txn.....\n"); MARK = 1; continue; }
retval = modify_update_all(be, pb, &newparent_modify_context, &txn); slapi_log_err(SLAPI_LOG_BACKLDBM, "ldbm_back_modrdn", "conn=%lu op=%d modify_update_all: old_entry=0x%p, new_entry=0x%p, rc=%d\n", conn_id, op_id, parent_modify_context.old_entry, parent_modify_context.new_entry, retval); if (DB_LOCK_DEADLOCK == retval) { / Retry txn / continue; }
Hi Mark, don't you need to set retval=DB_LOCK_DEADLOCK to really simulate the deadlock ?
No, it just loops and that variable gets reset.
Metadata Update from @mreynolds: - Issue set to the milestone: 1.4.0
This is still a problem even with all the latest modrdn and entry cache fixes (tested on 1.4.1).
Metadata Update from @mreynolds: - Issue assigned to mreynolds
Metadata Update from @mreynolds: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1744623
Issue linked to Bugzilla: Bug 1744623
https://pagure.io/389-ds-base/pull-request/50556
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review (was: None)
Commit b5d9627 relates to this ticket
2672461..ab03ec6 389-ds-base-1.4.0 -> 389-ds-base-1.4.0
95acf7a..52df7d9 389-ds-base-1.3.10 -> 389-ds-base-1.3.10
2c1bd9a..40c9a5a 389-ds-base-1.3.9 -> 389-ds-base-1.3.9
f4133b7..2855482 389-ds-base-1.3.8 -> 389-ds-base-1.3.8
Metadata Update from @mreynolds: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2683
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Log in to comment on this ticket.