#47936 Create a global lock to serialize write operations over several backends
Closed: Fixed None Opened 4 years ago by tbordaz.

The origin of this ticket is https://fedorahosted.org/freeipa/ticket/4635

This IPA ticket is a deadlock where two threads acquire locks in different orders.
A reason of the deadlock is that a plugin access pages on different backends for which it does not hold the backend lock.

This ticket is to evaluate if a global lock would prevent such deadlock.

This global lock need to cover 'ldbm database' backend but also other backend like 'cn=config' or 'cn=schema'...


This occurs in the following conditions:
If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure
The betxn_post plugin is doing internal op (read or write) while holding its own lock
* The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2.
A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired.
This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock release
d.

Replying to [comment:1 tbordaz]:

This occurs in the following conditions:
If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure
The betxn_post plugin is doing internal op (read or write) while holding its own lock
* The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2.
A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired.
This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock released.

Thierry,if the deadlock is caused by the backend serial lock and a global lock in the individual plugin, I also ran into the issue multiple times, and fixed by starting transaction outside of holding the plugin global lock. The backend transaction is re-entrant, so you could call slapi_back_transaction_begin multiple times in one thread.

Could you search slapi_back_transaction in the plugin directory and see if it works as a workaround or not? Here's an example.
automember/automember.c: if(slapi_back_transaction_begin(fixup_pb) != LDAP_SUCCESS){
automember/automember.c: slapi_back_transaction_abort(fixup_pb);
automember/automember.c: slapi_back_transaction_commit(fixup_pb);

Thanks!
--noriko

Replying to [comment:1 tbordaz]:

This occurs in the following conditions:
If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure
The betxn_post plugin is doing internal op (read or write) while holding its own lock
* The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

What does "backends for which it does not hold the backend lock" mean?

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2.
A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired.
This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock release
d.

Ok. So this is a new issue caused by moving all of the plugins to be pre/post betxn plugins. The locking order should be:
1) backend lock - that is, a new transaction
2) plugin lock

In this case, plugins such as schema reload, fixup memberof, etc. should first do like Noriko suggested, and do a slapi_back_transaction_begin e.g.
{{{
plugin()
{
...
slapi_back_transaction_begin
...
plugin lock
...
do internal operations
...
plugin unlock
...
slapi_back_transaction_abort/commit
}
}}}

Replying to [comment:4 rmeggins]:

Replying to [comment:1 tbordaz]:

This occurs in the following conditions:
If a betxn_post plugin defines its own lock (memberof, schema compat, automember...) to protect its internal structure
The betxn_post plugin is doing internal op (read or write) while holding its own lock
* The betxn_post plugin is doing internal op (read or write) on backends for which it does not hold the backend lock

What does "backends for which it does not hold the backend lock" mean?

The betxn_post plugin is called during a transaction so it is also holding some database pages (in write).

The deadlock occurs if a thread T1 holds a database page lock L1 then aquire the plugin own lock L2.
A thread T2, acquired L2 and is doing internal op on same database that T1 and to do so it acquires de database page lock L1.

This schema should not occur during pre_op, be_preop or betxn_preop because database pages have not yet been acquired.
This schema should not occur during post_op or be_postop because the transaction was already commited and database page lock release
d.

Ok. So this is a new issue caused by moving all of the plugins to be pre/post betxn plugins. The locking order should be:
1) backend lock - that is, a new transaction
2) plugin lock

In this case, plugins such as schema reload, fixup memberof, etc. should first do like Noriko suggested, and do a slapi_back_transaction_begin e.g.
{{{
plugin()
{
...
slapi_back_transaction_begin
...
plugin lock
...
do internal operations
...
plugin unlock
...
slapi_back_transaction_abort/commit
}
}}}

Noriko, Rich,

Thanks you so much for having look into that ticket.
In fact you are right, schema compat plugin is not using slapi_back_transaction_begin/slapi_back_transaction_abort/commit to surround its plugin lock.
I will implement your suggestion.

A question about how slapi_back_transaction would prevent the deadlock, does it rely on the deadlock detection done by the DB.

thanks
thierry

Starting a transaction acquires a lock over the entire db. No other write operations can proceed. A search operation may proceed, and a deadlock is still possible, if the write operation attempts to acquire a write lock on a page that a search thread has a read lock on, and the search thread cannot proceed because the writer thread has some other lock (e.g. a plugin lock) that the search thread is waiting on. However, if this happens frequently, changing the db deadlock policy Ticket #47409 may help.

  • slapi_back_transaction_begin/slapi_back_transaction_commit is not able to prevent the deadlock .
    This is the same deadlock stack than in https://fedorahosted.org/freeipa/ticket/4635#comment:1
  • The automember task modifies a static group to add a member. It holds in write some member.db page (under transaction). Starts a transaction (slapi_back_transaction_begin) and tries to acquire the schema compat map lock
    {{{
    member: fqdn=web1.idm.lab.bos.redhat.com,cn=computers,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
    group : cn=hostgroup1,cn=hostgroups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
    }}}
  • A task entry is added (in cn=config) that acquires schema compat map lock (postop add). Before acquiring the map lock it started a transaction (slapi_back_transaction_begin) but on cn=config (not on bdb backend). Then it tries to acquire (read) a member.db page to do the following search
    {{{
    Added entry: cn=cf58f2cd-8015-48e8-88f9-014a58fa830c,cn=automember rebuild membership,cn=tasks,cn=config
    Internal search
    base: "cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
    backend "userRoot"
    scope: one level
    filter: member=cn=cf58f2cd-8015-48e8-88f9-014a58fa830c,cn=automember rebuild membership,cn=tasks,cn=config
    }}}

slapi_back_transaction_begin/slapi_back_transaction_commit can only prevent deadlock if all callers start transactions on bdb backends. As ADDing the task entry is done on a none bdb backend, slapi_back_transaction_* can not prevent deadlock

  • The internal search looks useless, as it makes no sense to look for a task entry into an accounts group.
    A possible fix in schema compat, would be to prevent those searches.
    Now it could be difficult to fitler it and it is not sure we can prevent ALL internal searches (triggered by schema compat) under the database.

  • The remaining solution is to use a global backend lock (see https://fedorahosted.org/freeipa/ticket/4635#comment:13).
    Need to do some tests on the performance side.
    I will prepare a review on this 389-ds patch

Check if this is still needed with Thierry.
11/18 - It is still needed, but it's not urgent. 1.3.4

Per ticket triage, setting the milestone to 1.3.4.

does the "cn=config" backend include "cn=schema"?

Replying to [comment:12 rmeggins]:

does the "cn=config" backend include "cn=schema"?

Thanks Rich for looking at it.
I need to verify this tomorrow but I think it should.
My understanding is that:

{{{
'backend-type: DSE' -> global lock applies to both "cn=config" and "cn=schema".
'backend-name: frontent-internal' -> global lock applies to cn=config
'backend-name: schema-internal' -> global lock applies to cn=schema
}}}

A minor issue :); it'd better put the new code (3970 - 3972) after the PR_ASSERT(NULL != be);
{{{
… … void dblayer_lock_backend(backend be)
3967 3967 {
3968 3968 ldbm_instance
inst;
3969 3969
3970 if (global_backend_lock_requested(be)) {
3971 global_backend_lock_lock();
3972 }
3973
3970 3974 PR_ASSERT(NULL != be);
}}}

No need to check the return value from create_global_lock_entry? If it fails, global lock is not acquired? Any other disadvantage?
{{{
291 global_backend_lock_init()
292 {
293 / Create the global backend entry in dse.ldif (if it does not already exist) /
294 create_global_lock_entry();
}}}

The logging is available only if SLAPI_LOG_CONFIG is set == not too verbose, you don't need "#if 1"?
{{{
246 #if 1
247 slapi_log_error(SLAPI_LOG_CONFIG, NULL, "Global backend lock applies on %s: %s\n",
248 msg ? msg : "", value);
249 #endif

}}}

May not be an issue, but it'd be nice if ... (in dse_modify, dse_add, and dse_delete)
{{{
a b dse_modify
1831 Slapi_Backend be;
==> Slapi_Backend
be = NULL;
1842 slapi_pblock_get( pb, SLAPI_BACKEND, &be);
1878 / Possibly acquire the global backend lock /
1879 if (global_backend_lock_requested(be)) {
==> if (be && global_backend_lock_requested(be)) {
1880 global_backend_lock_lock();
1881 global_lock_owned = PR_TRUE;
1882 }
}}}

Indentation? :)
{{{
… … main( int argc, char argv)
1057 1057 /
initialize the normalized DN cache
/
1058 1058 ndn_cache_init();
1059 1059
1060 global_backend_lock_init();
1060 1061 /*
}}}

I'm a bit afraid that every time you have to do strcasecmp in 2 for loops in global_backend_lock_requested to acquire a lock... It'd be nice if we could do integer compare instead... (by assigning an ID or something...) But it may introduce more complexity?
303 global_backend_lock_requested(Slapi_Backend *be)

Replying to [comment:12 rmeggins]:

does the "cn=config" backend include "cn=schema"?

Rich,

Yes both backends 'cn=config' and 'cn=schema' are covered by the global lock.

{{{
'backend-type: DSE' -> global lock applies to both "cn=config" and "cn=schema".
'backend-name: frontent-internal' -> global lock applies to cn=config
'backend-name: schema-internal' -> global lock applies to cn=schema
}}}

Performance measurement (throughput): When the global lock is not enabled, performance are identical compare to 1.3.1 (reference).
When the global lock is enabled, it has no performance impact on read.
When the global lock is enabled, on write the impact is the most significant when writes are done in parallel on backends protected by the same global lock. In that case it reduce the throughput by ~1/3rd. When write is done on only on backend (but protected by a global lock), it has no impact on ADD/DEL, on MOD it reduces throughput ~15%.

{{{
Tool: ldclt (Global average rate)

Per suffix
Number of entries : 50000
Number of op thread: 10
Entry cache: 100Mb

Dbcache: 100Mb

    DS version      1.3.1.6                                 master + 47936        
                                    global_lock disable                     global_lock enabled

ADD
one suffix 714.30/s 714.30/s 714.30/s
two suffixes 625.01/s 714.30/s 500.01/s

DEL
one suffix 714.30/s 714.30/s 714.30/s
two suffixes 555.57/s 625.01/s 384.62/s

MOD
one suffix 1000.02/s 1000.02/s 833.35/s
two suffixes 1000.02/s 1000.02/s 714.30/s

SRCH
one suffix 2500.05/s 2500.05/s 2500.05/s
two suffixes 2500.05/s 2500.05/s 2500.05/s

}}}

copying my comment from the mailing list to the ticket, and adding another concern

<quote>
does it make sense to allow subsets of the backends to be locked by the global lock. In our experience with deadlaocks different combinations of backends were involved - so to enable it one would have to know in advance which backends to list. I think if we introduce this global lock, make it global, it would also make the config check if it should be used faster.

If you want to keep the configuration of individual backends I would like to have one "ALL" option
</quote>

if the global lock is not really global it can introduce a new deadlock which was not there before:
assume to have four backends A,B,C,D

scenario without gloabl lock:
1] thread 1 mods backend A, tackes backend lock for A
2] simultaneously thread 2 mods backend C, takes backend lock for C
3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C
4] postop plugin in thread 2 wants to modify entry in D,
4.1] takes backend lock D
4.2] modifies entry in D
4.3] releases backen lock D
5] thread 2 release backend lock C
6] thread 1 gets backend lock C and continues

now configure global lock, but only for A,B,D, same scenario now gets:
1] thread 1 mods backend A, tackes backend lock for A, takes global lock
2] simultaneously thread 2 mods backend C, takes backend lock for C, no global lock
3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C
4] postop plugin in thread 2 wants to modify entry in D,
4.1] for D global lock is required, wait for global lock

no thread can progress,
- thread 1 holds global lock, waits for backend lock C
- thread 2 holds backen lock C, waits for global lock

Replying to [comment:17 lkrispen]:

copying my comment from the mailing list to the ticket, and adding another concern

<quote>
does it make sense to allow subsets of the backends to be locked by the global lock. In our experience with deadlaocks different combinations of backends were involved - so to enable it one would have to know in advance which backends to list. I think if we introduce this global lock, make it global, it would also make the config check if it should be used faster.

If you want to keep the configuration of individual backends I would like to have one "ALL" option
</quote>

if the global lock is not really global it can introduce a new deadlock which was not there before:
assume to have four backends A,B,C,D

scenario without gloabl lock:
1] thread 1 mods backend A, tackes backend lock for A
2] simultaneously thread 2 mods backend C, takes backend lock for C
3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C
4] postop plugin in thread 2 wants to modify entry in D,
4.1] takes backend lock D
4.2] modifies entry in D
4.3] releases backen lock D
5] thread 2 release backend lock C
6] thread 1 gets backend lock C and continues

now configure global lock, but only for A,B,D, same scenario now gets:
1] thread 1 mods backend A, tackes backend lock for A, takes global lock
2] simultaneously thread 2 mods backend C, takes backend lock for C, no global lock
3] postop plugin in thread 1 wants to modify entry in C, waits for backend lock C
4] postop plugin in thread 2 wants to modify entry in D,
4.1] for D global lock is required, wait for global lock

no thread can progress,
- thread 1 holds global lock, waits for backend lock C
- thread 2 holds backen lock C, waits for global lock

Hello Ludwig,

Thanks for your remarks. I think your second remark (case of deadlock) makes more important the need to be able to protect in a simple way all backends. I changed the fix to support a 'all' keyword (backend-type or backend-name). I am testing this new fix.

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

Replying to [comment:18 tbordaz]:

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

The issue occurs only if you allow the global lock to be configured oj a subset of the backends.
I still do not see the need to have these configuration options. In my opinion "ALL" or "NOTHING" should be the only option for the global lock.

Replying to [comment:19 lkrispen]:

Replying to [comment:18 tbordaz]:

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

The issue occurs only if you allow the global lock to be configured oj a subset of the backends.
I still do not see the need to have these configuration options. In my opinion "ALL" or "NOTHING" should be the only option for the global lock.

+1 Then we can see if we need to have more fine-grained locking.

Replying to [comment:20 rmeggins]:

Replying to [comment:19 lkrispen]:

Replying to [comment:18 tbordaz]:

Regarding the deadlock scenario, the deadlock occurs because C was not covered by global deadlock. What is frustrating is that this scenario did not exist before global lock. I think global lock leads to same kind of issues as we have with plugin locks.

The issue occurs only if you allow the global lock to be configured oj a subset of the backends.
I still do not see the need to have these configuration options. In my opinion "ALL" or "NOTHING" should be the only option for the global lock.

+1 Then we can see if we need to have more fine-grained locking.

+1

And I still don't see the point checking on each backend whether it's in the configured list dynamically in global_backend_lock_requested. The configured list is determined in the init phase and there's no case that a backend is in the list at a moment and is removed from it at another moment (I guess?). And if it's possible, it's even worse. If a backend acquires the global lock and could get removed from the global lock list, the server hangs there... I know that does not occur in the current implementation as I mentioned. That's said the dynamic checking is not needed? I also prefer the simple approach to solve this issue...

{{{
PRMonitor *global_backend_mutex = NULL;
}}}
This should be static

otherwise, ack

'git merge ticket47936'
Updating 5c6329e..8583012
Fast-forward
ldap/servers/slapd/back-ldbm/dblayer.c | 7 +++++++
ldap/servers/slapd/backend.c | 29 +++++++++++++++++++++++++++++
ldap/servers/slapd/dse.c | 32 ++++++++++++++++++++++++++++++--
ldap/servers/slapd/libglobs.c | 33 ++++++++++++++++++++++++++++++++-
ldap/servers/slapd/main.c | 1 +
ldap/servers/slapd/proto-slap.h | 6 ++++++
ldap/servers/slapd/slap.h | 2 ++
7 files changed, 107 insertions(+), 3 deletions(-)

'git push origin master'
Counting objects: 25, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (13/13), 2.37 KiB, done.
Total 13 (delta 11), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
5c6329e..8583012 master -> master

'git push origin 389-ds-base-1.3.3'
Counting objects: 25, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (13/13), 2.38 KiB, done.
Total 13 (delta 11), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
8f6871b..748054f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3

Metadata Update from @tbordaz:
- Issue assigned to tbordaz
- Issue set to the milestone: 1.3.3.9

2 years ago

Login to comment on this ticket.

Metadata