#47771 Performing deletes during tombstone purging results in operation errors
Closed: Fixed None Opened 5 years ago by mreynolds.

There is a race condition when looking up the parent of a deleted entry, where the cached entry is replaced by tombstone purging(updating the parents tombstonenumsubordinate attribute). The cached entry gets removed just after retrieving the entry from the cache, but before it can be locked (cache_lock_entry). This results in an operations error when trying to lock the entry that was just retrieved from the cache.


Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

Replying to [comment:1 nhosoi]:

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

We have other retry "magic numbers" - would prefer to use one of those.

Also, does this actually work without doing a sleep or thread yield between loop iterations? If so, it must be because of the db I/O and/or mutexes that are called in id2entry. Seems bad to have a tight loop without a yield to let other threads continue.

Replying to [comment:2 rmeggins]:

Replying to [comment:1 nhosoi]:

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

I'll look into this.

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

We have other retry "magic numbers" - would prefer to use one of those.

There is a LDBM CACHE RETRY define set to 1000 I can use.

Also, does this actually work without doing a sleep or thread yield between loop iterations? If so, it must be because of the db I/O and/or mutexes that are called in id2entry. Seems bad to have a tight loop without a yield to let other threads continue.

In all of my tests the very next pass succeeds - it's never gone past one iteration. id2entry does take the cache lock, but I can add a short sleep. I'll work on a new patch...

Replying to [comment:3 mreynolds]:

Replying to [comment:2 rmeggins]:

Replying to [comment:1 nhosoi]:

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

I'll look into this.

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

We have other retry "magic numbers" - would prefer to use one of those.

There is a LDBM CACHE RETRY define set to 1000 I can use.

Also, does this actually work without doing a sleep or thread yield between loop iterations? If so, it must be because of the db I/O and/or mutexes that are called in id2entry. Seems bad to have a tight loop without a yield to let other threads continue.

In all of my tests the very next pass succeeds - it's never gone past one iteration. id2entry does take the cache lock, but I can add a short sleep. I'll work on a new patch...

New patch attached...

git merge ticket47771
Updating ce23d5d..844d09d
Fast-forward
ldap/servers/slapd/back-ldbm/back-ldbm.h | 1 +
ldap/servers/slapd/back-ldbm/cache.c | 6 ++-
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 46 +++++++++++++++++++--------
3 files changed, 37 insertions(+), 16 deletions(-)

git push origin master
ce23d5d..844d09d master -> master

commit 844d09d
Author: Mark Reynolds mreynolds@redhat.com
Date: Tue Apr 8 14:39:47 2014 -0400

0ddf772..48844b2 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 48844b2

203970e..44ecbbe 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 44ecbbec4490b012e4ea5d02dfe9d77ec5526acf

48e2a96..c5f22dd 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit c5f22dd

Fixed Cherry-pick issue on 1.2.11

git merge ticket47771
Updating 5e45f45..b5cd247
Fast-forward
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

git push origin 389-ds-base-1.2.11
5e45f45..b5cd247 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

commit b5cd247
Author: Mark Reynolds mreynolds@redhat.com
Date: Mon Apr 21 11:32:55 2014 -0400

You have my ack, but I liked your previous way putting slapi_sdn_done in the clean-up area. Another way -- initializing parentsdn with NULL does not help?

Replying to [comment:11 nhosoi]:

You have my ack, but I liked your previous way putting slapi_sdn_done in the clean-up area. Another way -- initializing parentsdn with NULL does not help?

It's a local variable not a pointer, but I moved the initialization to the top of the function. So it should be okay now.

Attaching new patch...

Yep, that's better. Thanks!

git merge ticket47771
Updating 06a3d0a..991984f
Fast-forward
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

git push origin master
06a3d0a..991984f master -> master

commit 991984f
Author: Mark Reynolds mreynolds@redhat.com
Date: Mon Apr 21 12:43:09 2014 -0400

97412d6..bdb6962 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit bdb6962

eb6ea36..4b986ee 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 4b986ee0536114a568c30b808663ee590f0c1a86

b5cd247..00a7594 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 00a7594

Metadata Update from @nhosoi:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.2.11.30

2 years ago

Login to comment on this ticket.

Metadata