#49624 DB Deadlock on modrdn appears to corrupt database and entry cache
Closed: wontfix 2 years ago by mreynolds. Opened 4 years ago by mreynolds.

Issue Description

If a db deadlock error occurs during a MODRDN that operation is tried again, but on the second pass things go wrong on that same operation.

So do a modrdn and move an entry so a new superior. Then try and move it back to the original subtree. Note - I did instrument the code to always trigger a single deadlock error. When I try to move it back to the original substree/superior I get an error 68!

ldapsearch shows the entry was not moved as expected since we got an error:

ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=accoun* * +
dn: cn=Accounting Managers,ou=MyOU,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
ou: groups
description: People who can manage accounting entries
uniqueMember: cn=dm
nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f
creatorsName:
modifiersName: cn=dm
createTimestamp: 20180323123313Z
modifyTimestamp: 20180323130251Z
entryid: 6
parentid: 10
entrydn: cn=accounting managers,ou=myou,ou=groups,dc=example,dc=com

If I restart the server:

The entry is now in the original subtree (even though we got an error that it failed)

ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=accoun* * +

dn: cn=Accounting Managers,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
...

Performing ldapsearch using various scopes also gives inconsistent results for this entry:

[root@localhost BUILD]# ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s one -xLLL cn=account*

---> no results

[root@localhost BUILD]# ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=account*
dn: cn=Accounting Managers,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
ou: groups
description: People who can manage accounting entries
uniqueMember: cn=dm
nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f
creatorsName:
modifiersName: cn=dm
createTimestamp: 20180323123313Z
modifyTimestamp: 20180323130251Z
entryid: 6
parentid: 10
entrydn: cn=accounting managers,ou=groups,dc=example,dc=com

dbscan shows that the entry's parentid is still pointing to the old subtree:

rdn: cn=Accounting Managers
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
ou: groups
description: People who can manage accounting entries
uniqueMember: cn=dm
nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f
creatorsName:
modifiersName: cn=dm
createTimestamp: 20180323123313Z
modifyTimestamp: 20180323130251Z
entryid: 6
parentid: 10

parentid should be 3 (not 10) in this case. Perhaps that is messing up the scoped search?

If I export and reimport the ldif, the parentid is adjusted to the correct value of 3, and the entry is found under the original subtree.

So we are seeing db & entry cache corruption when a db deadlock occurs on modrdn operations. That being said the way I am forcing the db deadlock condition might be flawed. So this needs more investigation...


This is how I'm triggering the db deadlock in ldbm_modrdn.c

line: ~1070
}
        /* Push out the db modifications from the new parent entry */
        else /* retval == 0 */
        {

            if (MARK == 0 ){
                slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Retry txn.....\n");
                MARK = 1;
                continue;
            }

            retval = modify_update_all(be, pb, &newparent_modify_context, &txn);
            slapi_log_err(SLAPI_LOG_BACKLDBM, "ldbm_back_modrdn",
                          "conn=%lu op=%d modify_update_all: old_entry=0x%p, new_entry=0x%p, rc=%d\n",
                          conn_id, op_id, parent_modify_context.old_entry, parent_modify_context.new_entry, retval);
            if (DB_LOCK_DEADLOCK == retval) {
                /* Retry txn */
                continue;
            }

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

4 years ago

The error 68 that is caused by moving an entry back to its oriignal superior is coming from:

ldbm_modrdn.c: ~480

        /* Check that an entry with the same DN doesn't already exist. */
        {
            Slapi_Entry *entry;
            slapi_pblock_get(pb, SLAPI_MODRDN_EXISTING_ENTRY, &entry);

            if ((entry != NULL) &&
                /* allow modrdn even if the src dn and dest dn are identical */
                (0 != slapi_sdn_compare((const Slapi_DN *)&dn_newdn,
                                        (const Slapi_DN *)sdn))) {
                ldap_result_code = LDAP_ALREADY_EXISTS;
                slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Already exists 1 (%p) new dn (%s) old dn (%s)\n",
                        entry, slapi_sdn_get_dn((const Slapi_DN *)&dn_newdn), slapi_sdn_get_dn((const Slapi_DN *)sdn) );
                goto error_return;
            }
        }

"entry" is not NULL, and the two DNs are different --> this triggers ALREADY_EXISTS. During a working MODRDN the existing entry is NULL.

This is how I'm triggering the db deadlock in ldbm_modrdn.c
line: ~1070
}
/ Push out the db modifications from the new parent entry /
else / retval == 0 /
{

if (MARK == 0 ){
slapi_log_err(SLAPI_LOG_CRIT, "MARK", "Retry txn.....\n");
MARK = 1;
continue;
}

retval = modify_update_all(be, pb, &newparent_modify_context, &txn);
slapi_log_err(SLAPI_LOG_BACKLDBM, "ldbm_back_modrdn",
"conn=%lu op=%d modify_update_all: old_entry=0x%p, new_entry=0x%p, rc=%d\n",
conn_id, op_id, parent_modify_context.old_entry, parent_modify_context.new_entry, retval);
if (DB_LOCK_DEADLOCK == retval) {
/ Retry txn /
continue;
}

Hi Mark, don't you need to set retval=DB_LOCK_DEADLOCK to really simulate the deadlock ?

Hi Mark, don't you need to set retval=DB_LOCK_DEADLOCK to really simulate the deadlock ?

No, it just loops and that variable gets reset.

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.0

4 years ago

This is still a problem even with all the latest modrdn and entry cache fixes (tested on 1.4.1).

Metadata Update from @mreynolds:
- Issue assigned to mreynolds

2 years ago

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1744623

2 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: None)

2 years ago

Commit b5d9627 relates to this ticket

2672461..ab03ec6 389-ds-base-1.4.0 -> 389-ds-base-1.4.0

95acf7a..52df7d9 389-ds-base-1.3.10 -> 389-ds-base-1.3.10

2c1bd9a..40c9a5a 389-ds-base-1.3.9 -> 389-ds-base-1.3.9

f4133b7..2855482 389-ds-base-1.3.8 -> 389-ds-base-1.3.8

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2683

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

2 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests