#290 server hangs during shutdown if betxn pre/post op fails
Closed: wontfix None Opened 9 years ago by rmeggins.

If using betxn pre/post operation plugins, and a betxn pre/post op plugin returns a failure code, the dblayer_env_lock is not released. During shutdown, or backend deletion, the code attempts to acquire a write lock, and the write lock will block forever because the previous read locks were not released.


I think this unnecessary result code checking accidentally skipped the dblayer_txn_abort... The problematic code is found only in ldbm_modify. (Other ops -- add, delete, modrdn -- do not check ldap_result_code before calling dblayer_txn_abort.) I'd like to run some more tests to verify my theory... Could you share your test case?
Thanks!

diff --git a/ldap/servers/slapd/back-ldbm/ldbm_modify.c b/ldap/servers/slapd/bac
index f4f1263..10c38cf 100644
--- a/ldap/servers/slapd/back-ldbm/ldbm_modify.c
+++ b/ldap/servers/slapd/back-ldbm/ldbm_modify.c
@@ -614,7 +614,7 @@ error_return:

    if (disk_full) {
        rc= return_on_disk_full(li);
  • } else if (ldap_result_code != LDAP_SUCCESS) {
  • } else {
    if (retry_count > 0) {
    / It is safer not to abort when the transaction is not
    dblayer_txn_abort(li,&txn); /
    abort crashes in case dis

I found this running the Managed Entry and Linked Attributes tests with the plugin type changed to betxnpreoperation. The tests would hang during the cleanup phase. I have a fix that I'm testing - the fix just does this:
{{{
- if (retry_count > 0) {
+ if (txn.back_txn_txn && (txn.back_txn_txn != parent_txn)) {
}}}

So far it seems to be working fine - no hangs.

Argh.. You are right. Stupid bugs. :(

Please add this change (in ldbm_back_modify) to your fix, too. The check is not needed.
- } else if (ldap_result_code != LDAP_SUCCESS) {
+ } else {

Thanks!

ok - but is there a case where ldap_result_code == LDAP_SUCCESS and we have to abort the transaction?

Replying to [comment:4 rmeggins]:

ok - but is there a case where ldap_result_code == LDAP_SUCCESS and we have to abort the transaction?

I don't (want to) think there is such a case. But if an error occurs in one of the plugins (at other than the LDAP operations) and the plugin could slip setting ldap_result_code but just return error (which is set to retval only.)

1) 1.2.11 will have a different way to set the txn from plugins - so just remove those linked attrs changes
2) in the diskfull return case, I think the txn should be aborted too
3) the value sdn free patch looks good, but could we have that in a separate commit?

Replying to [comment:7 rmeggins]:

1) 1.2.11 will have a different way to set the txn from plugins - so just remove those linked attrs changes

That's what I thought. All right. I'll remove them.

2) in the diskfull return case, I think the txn should be aborted too

Theoretically, yes. But most likely, it crashes in txn_abort. (And the crash cannot be captured...):

3) the value sdn free patch looks good, but could we have that in a separate commit?

Yeah, that'd be better. Do we open a new ticket for such a case?

Replying to [comment:8 nhosoi]:

Replying to [comment:7 rmeggins]:

1) 1.2.11 will have a different way to set the txn from plugins - so just remove those linked attrs changes

That's what I thought. All right. I'll remove them.
Thanks

2) in the diskfull return case, I think the txn should be aborted too

Theoretically, yes. But most likely, it crashes in txn_abort. (And the crash cannot be captured...):
ok - then what you have is fine

3) the value sdn free patch looks good, but could we have that in a separate commit?

Yeah, that'd be better. Do we open a new ticket for such a case?

No, just a different commit - will make it easier to track and cherry-pick if necessary

Thanks to Rich for his reviews and comments.

I've removed the changes on linked_attrs.c. And separated the fixes into 2 parts as follows:

commit 69087b4
Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com
Date: Tue Feb 21 13:22:07 2012 -0800

Trac Ticket #290 - server hangs during shutdown if betxn pre/post op fails

https://fedorahosted.org/389/ticket/290

Fix description: If a dn type modify value is invalid, the modify
operation could crash the server.  This patch fixes it.

commit 4495cf3
Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com
Date: Tue Feb 21 13:20:12 2012 -0800

Trac Ticket #290 - server hangs during shutdown if betxn pre/post op fails

https://fedorahosted.org/389/ticket/290

Fix description: The logic to call dblayer_txn_abort in the
ldbm_back update functions was wrong.  If the operations fail,
the abort function has to be called if dblayer_txn_begin is
already called and dblayer_txn_commit is not.  This patch
checks the txn value set by dblayer_txn_commit to determine
to call dblayer_txn_abort or not.

Pushed to master.

$ git merge trac290
Updating 2e5ee4d..69087b4
Fast-forward
ldap/servers/slapd/back-ldbm/ldbm_add.c | 4 ++--
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 4 ++--
ldap/servers/slapd/back-ldbm/ldbm_modify.c | 6 +++---
ldap/servers/slapd/back-ldbm/ldbm_modrdn.c | 5 +++--
ldap/servers/slapd/value.c | 3 ++-
5 files changed, 12 insertions(+), 10 deletions(-)

$ git push
Counting objects: 26, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), 1.85 KiB, done.
Total 16 (delta 13), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
2e5ee4d..69087b4 master -> master

this fix was cherry-picked to 1.2.10
commit changeset:b197245/389-ds-base
Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com
Date: Tue Feb 21 13:22:07 2012 -0800
Fix description: If a dn type modify value is invalid, the modify
operation could crash the server. This patch fixes it.
(cherry picked from commit changeset:69087b4/389-ds-base)

commit changeset:b197245/389-ds-base
Author: Noriko Hosoi nhosoi@ponyo.sjc.redhat.com
Date: Tue Feb 21 13:22:07 2012 -0800
Fix description: If a dn type modify value is invalid, the modify
operation could crash the server. This patch fixes it.
(cherry picked from commit changeset:69087b4/389-ds-base)
1.2.10 branch

Added initial screened field value.

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.11.a1

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/290

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

9 months ago

Login to comment on this ticket.

Metadata