Issue #47771: Performing deletes during tombstone purging results in operation errors - 389-ds-base

389-ds-base

#47771 Performing deletes during tombstone purging results in operation errors

Closed: wontfix None Opened 10 years ago by mreynolds.

There is a race condition when looking up the parent of a deleted entry, where the cached entry is replaced by tombstone purging(updating the parents tombstonenumsubordinate attribute). The cached entry gets removed just after retrieving the entry from the cache, but before it can be locked (cache_lock_entry). This results in an operations error when trying to lock the entry that was just retrieved from the cache.

nhosoi commented 10 years ago

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

rmeggins commented 10 years ago

Replying to [comment:1 nhosoi]:

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

We have other retry "magic numbers" - would prefer to use one of those.

Also, does this actually work without doing a sleep or thread yield between loop iterations? If so, it must be because of the db I/O and/or mutexes that are called in id2entry. Seems bad to have a tight loop without a yield to let other threads continue.

mreynolds commented 10 years ago

Replying to [comment:2 rmeggins]:

Replying to [comment:1 nhosoi]:

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

I'll look into this.

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

We have other retry "magic numbers" - would prefer to use one of those.

There is a LDBM CACHE RETRY define set to 1000 I can use.

Also, does this actually work without doing a sleep or thread yield between loop iterations? If so, it must be because of the db I/O and/or mutexes that are called in id2entry. Seems bad to have a tight loop without a yield to let other threads continue.

In all of my tests the very next pass succeeds - it's never gone past one iteration. id2entry does take the cache lock, but I can add a short sleep. I'll work on a new patch...

mreynolds commented 10 years ago

revision
0001-Ticket-47771-Performing-deletes-during-tombstone-pur.patch

mreynolds commented 10 years ago

Replying to [comment:3 mreynolds]:

Replying to [comment:2 rmeggins]:

Replying to [comment:1 nhosoi]:

Now I see the cause of the problem... A great catch!

It looks parentsdn was leaking in the error case? Probably, it'd be better to call "slapi_sdn_done(&parentsdn);" once after the common_return label?

I'll look into this.

And I know this is always hard to determine, but why you think 10 is good instead of 3 or 10000? ;)

We have other retry "magic numbers" - would prefer to use one of those.

There is a LDBM CACHE RETRY define set to 1000 I can use.

Also, does this actually work without doing a sleep or thread yield between loop iterations? If so, it must be because of the db I/O and/or mutexes that are called in id2entry. Seems bad to have a tight loop without a yield to let other threads continue.

In all of my tests the very next pass succeeds - it's never gone past one iteration. id2entry does take the cache lock, but I can add a short sleep. I'll work on a new patch...

New patch attached...

rmeggins commented 10 years ago

ack

mreynolds commented 10 years ago

git merge ticket47771
Updating ce23d5d..844d09d
Fast-forward
ldap/servers/slapd/back-ldbm/back-ldbm.h | 1 +
ldap/servers/slapd/back-ldbm/cache.c | 6 ++-
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 46 +++++++++++++++++++--------
3 files changed, 37 insertions(+), 16 deletions(-)

git push origin master
ce23d5d..844d09d master -> master

commit 844d09d
Author: Mark Reynolds mreynolds@redhat.com
Date: Tue Apr 8 14:39:47 2014 -0400

0ddf772..48844b2 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 48844b2

203970e..44ecbbe 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 44ecbbec4490b012e4ea5d02dfe9d77ec5526acf

48e2a96..c5f22dd 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit c5f22dd

mreynolds commented 10 years ago

Fixed Cherry-pick issue on 1.2.11

git merge ticket47771
Updating 5e45f45..b5cd247
Fast-forward
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

git push origin 389-ds-base-1.2.11
5e45f45..b5cd247 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

commit b5cd247
Author: Mark Reynolds mreynolds@redhat.com
Date: Mon Apr 21 11:32:55 2014 -0400

mreynolds commented 10 years ago

Fixed double slapi_sdn_done() call - only on 1.2.11
0001-Ticket-47771-Cherry-pick-issue-parentsdn-freed-twice.patch

nhosoi commented 10 years ago

You have my ack, but I liked your previous way putting slapi_sdn_done in the clean-up area. Another way -- initializing parentsdn with NULL does not help?

mreynolds commented 10 years ago

Replying to [comment:11 nhosoi]:

You have my ack, but I liked your previous way putting slapi_sdn_done in the clean-up area. Another way -- initializing parentsdn with NULL does not help?

It's a local variable not a pointer, but I moved the initialization to the top of the function. So it should be okay now.

Attaching new patch...

mreynolds commented 10 years ago

Revision
0001-Ticket-47771-Move-parentsdn-initialization-to-avoid-.patch

nhosoi commented 10 years ago

Yep, that's better. Thanks!

mreynolds commented 10 years ago

git merge ticket47771
Updating 06a3d0a..991984f
Fast-forward
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

git push origin master
06a3d0a..991984f master -> master

commit 991984f
Author: Mark Reynolds mreynolds@redhat.com
Date: Mon Apr 21 12:43:09 2014 -0400

97412d6..bdb6962 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit bdb6962

eb6ea36..4b986ee 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 4b986ee0536114a568c30b808663ee590f0c1a86

b5cd247..00a7594 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 00a7594

Metadata Update from @nhosoi:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.2.11.30

7 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1103

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Metadata

Assignee

mreynolds

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

1.2.11.30

reviewstatus

ack

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=1086907
https://bugzilla.redhat.com/show_bug.cgi?id=1086908

origin

389-ds-base

Source Code

#47771 Performing deletes during tombstone purging results in operation errors Closed: wontfix None Opened 10 years ago by mreynolds.

Metadata

#47771 Performing deletes during tombstone purging results in operation errors

Closed: wontfix None Opened 10 years ago by mreynolds.