#48342 DNA: deadlock during DNA_EXTEND_EXOP_REQUEST_OID
Closed: Fixed None Opened 4 years ago by tbordaz.

Version: master

Two threads are taking locks (DNA config and backend lock) in the opposite order.
One thread processing the extop acquired the DNA config and is trying to update the shared config entry.
While an other thread processing a replicated update of a shared config entry, owns the backend lock and tries to update the DNA config

Testcase: not systematic. provided in CI test


Debug session {{{ (gdb) thread 3 [Switching to thread 3 (Thread 0x7fbc34ff9700 (LWP 27716))] #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 136 2: movl %edx, %eax (gdb) where #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x0000003bee409c8c in _L_lock_1024 () from /lib64/libpthread.so.0 #2 0x0000003bee409c35 in __pthread_mutex_lock (mutex=mutex@entry=0x1ff0ce8) at pthread_mutex_lock.c:105 #3 0x0000003c530233b9 in PR_Lock (lock=lock@entry=0x1ff0ce8) at ../../../nspr/pr/src/pthreads/ptsynch.c:174 #4 0x0000003c530239ab in PR_EnterMonitor (mon=0x1ff0ce0) at ../../../nspr/pr/src/pthreads/ptsynch.c:529 #5 0x00007fbc563ee811 in dblayer_lock_backend (be=0x1ff2b80) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/dblayer.c:3904 #6 0x00007fbc563ede82 in dblayer_txn_begin (be=0x1ff2b80, parent_txn=0x0, txn=0x7fbc34ff62c0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/dblayer.c:3623 #7 0x00007fbc5643f4f2 in ldbm_back_modify (pb=0x7fbc080025d0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:532 #8 0x00007fbc59b5b80e in op_shared_modify (pb=0x7fbc080025d0, pw_change=0, old_pw=0x0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:1054 #9 0x00007fbc59b5a431 in modify_internal_pb (pb=0x7fbc080025d0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:599 #10 0x00007fbc59b59fb1 in slapi_modify_internal_pb (pb=0x7fbc080025d0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:454 #11 0x00007fbc568a1712 in dna_update_shared_config (config_entry=0x225a4b0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:2555 #12 0x00007fbc5689fa43 in dna_notice_allocation (config_entry=0x225a4b0, new=200, last=0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:1707 #13 0x00007fbc568a5b8f in dna_release_range (range_dn=0x7fbc080008e0 "ou=ranges,dc=example,dc=com", lower=0x7fbc34ff88a8, upper=0x7fbc34ff88b0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:4530 #14 0x00007fbc568a53f1 in dna_extend_exop (pb=0x7fbc34ff8ae0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:4311 #15 0x00007fbc59b6f646 in plugin_call_exop_plugins (pb=0x7fbc34ff8ae0, oid=0x7fbc08001660 "2.16.840.1.113730.3.5.10") at ../workspaces/389-master-branch/ds/ldap/servers/slapd/plugin.c:510 #16 0x00000000004220b7 in do_extended (pb=0x7fbc34ff8ae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/extendop.c:337 #17 0x00000000004167f9 in connection_dispatch_operation (conn=0x7fbc50180800, op=0x225cb50, pb=0x7fbc34ff8ae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/connection.c:680 #18 0x00000000004188b6 in connection_threadmain () at ../workspaces/389-master-branch/ds/ldap/servers/slapd/connection.c:1743 #19 0x0000003c53028b46 in _pt_root (arg=0x22808a0) at ../../../nspr/pr/src/pthreads/ptthread.c:204 #20 0x0000003bee407d14 in start_thread (arg=0x7fbc34ff9700) at pthread_create.c:309 #21 0x0000003bee0f168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 # It acquired DNA config lock in read (gdb) frame 13 #13 0x00007fbc568a5b8f in dna_release_range (range_dn=0x7fbc080008e0 "ou=ranges,dc=example,dc=com", lower=0x7fbc34ff88a8, upper=0x7fbc34ff88b0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:4530 (gdb) list dna_release_range,4399 4384 { 4385 int ret = 0; 4386 int match = 0; ... 4396 if (range_dn) { 4397 range_sdn = slapi_sdn_new_dn_byref(range_dn); 4398 4399 dna_read_lock(); # Then update the shared config entry on main suffix (gdb) frame 11 #11 0x00007fbc568a1712 in dna_update_shared_config (config_entry=0x225a4b0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:2555 2555 slapi_modify_internal_pb(pb); (gdb) list 2550,2555 2550 ret = LDAP_OPERATIONS_ERROR; 2551 } else { 2552 slapi_modify_internal_set_pb(pb, config_entry->shared_cfg_dn, 2553 mods, NULL, NULL, getPluginID(), 0); 2554 2555 slapi_modify_internal_pb(pb); (gdb) print config_entry->shared_cfg_dn $6 = 0x22507b0 "dnaHostname=localhost.localdomain+dnaPortNum=42389,ou=ranges,dc=example,dc=com" # But the backend lock is acquired by Thread 4 (LWP 27712) (gdb) frame 5 #5 0x00007fbc563ee811 in dblayer_lock_backend (be=0x1ff2b80) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/dblayer.c:3904 3904 PR_EnterMonitor(inst->inst_db_mutex); (gdb) print *inst->inst_db_mutex $3 = {name = 0x0, lock = {mutex = {__data = {__lock = 2, __count = 0, __owner = 27712, __nusers = 1, __kind = 3, __spins = 5, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000@l\000\000\001\000\000\000\003\000\000\000\005", '\000' <repeats 18="" times="">, __align = 2}, notified = {length = 0, cv = {{cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}}, link = 0x0}, locked = 1, owner = 140446353315584}, owner = 140446353315584, cvar = 0x207f760, entryCount = 1} (gdb) thread 4 [Switching to thread 4 (Thread 0x7fbc36ffd700 (LWP 27712))] #0 pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:86 86 subq $WRITERS_WAKEUP, %rdi (gdb) where #0 pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:86 #1 0x00007fbc59b9fab1 in slapi_rwlock_wrlock (rwlock=0x207b8d0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/slapi2nspr.c:238 #2 0x00007fbc5689ce34 in dna_write_lock () at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:370 #3 0x00007fbc5689d7e4 in dna_load_shared_servers () at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:736 #4 0x00007fbc568a5168 in dna_config_check_post_op (pb=0x7fbc36ffcae0) at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:4231 #5 0x00007fbc59b72820 in plugin_call_func (list=0x1f922b0, operation=561, pb=0x7fbc36ffcae0, call_one=0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/plugin.c:1920 #6 0x00007fbc59b72681 in plugin_call_list (list=0x1f7b100, operation=561, pb=0x7fbc36ffcae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/plugin.c:1864 #7 0x00007fbc59b6f443 in plugin_call_plugins (pb=0x7fbc36ffcae0, whichfunction=561) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/plugin.c:438 #8 0x00007fbc5644028e in ldbm_back_modify (pb=0x7fbc36ffcae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:821 #9 0x00007fbc59b5b80e in op_shared_modify (pb=0x7fbc36ffcae0, pw_change=0, old_pw=0x0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:1054 #10 0x00007fbc59b59dcf in do_modify (pb=0x7fbc36ffcae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:387 #11 0x0000000000416667 in connection_dispatch_operation (conn=0x7fbc50180410, op=0x229b370, pb=0x7fbc36ffcae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/connection.c:627 #12 0x00000000004188b6 in connection_threadmain () at ../workspaces/389-master-branch/ds/ldap/servers/slapd/connection.c:1743 #13 0x0000003c53028b46 in _pt_root (arg=0x229e050) at ../../../nspr/pr/src/pthreads/ptthread.c:204 #14 0x0000003bee407d14 in start_thread (arg=0x7fbc36ffd700) at pthread_create.c:309 #15 0x0000003bee0f168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 # It is processing a replicated update of shared config entry (gdb) frame 10 #10 0x00007fbc59b59dcf in do_modify (pb=0x7fbc36ffcae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:387 #warning: Source file is more recent than executable. #387 op_shared_modify ( pb, pw_change, old_pw ); (gdb) print pb->pb_op->o_params.target_address.udn $14 = 0x7fbc18018640 "dnaHostname=localhost.localdomain+dnaPortNum=41389,ou=ranges,dc=example,dc=com" (gdb) print pb->pb_op->o_flags $9 = 968 # dna_config_check_post_op is a BETXN_POSTOP so backend lock is acquired # But tries to recreate the config, that needs the config lock in Write (gdb) frame 1 #1 0x00007fbc59b9fab1 in slapi_rwlock_wrlock (rwlock=0x207b8d0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/slapi2nspr.c:238 warning: Source file is more recent than executable. 238 ret = pthread_rwlock_wrlock((pthread_rwlock_t *)rwlock); (gdb) print *rwlock $4 = {__data = {__lock = 0, __nr_readers = 1, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 1, __writer = 0, __shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0}, __size = "\000\000\000\000\001", '\000' <repeats 15="" times="">, "\001", '\000' <repeats 34="" times="">, __align = 4294967296} }}}

Note: actually with the attached testcase, the deadlock is systematic. But it depends how fast the update is replicated so I guess it should not be as systematic as it was on my box.

I think there are a few issues in dna.c

I notice that:

{{{
The lock for configEntry should be obtained
before calling this function.
}}}

Before dna_update_shared_config.

Yet, in dna_update_config_event:

{{{
1511 / Get the read lock so we can copy the config /
1512 dna_read_lock();
....
1521 dna_unlock();
...
1542 / We need to start transaction to avoid the deadlock /
1543 rc = slapi_back_transaction_begin(dna_pb);
...
1560 dna_update_shared_config(config_entry);
}}}

So we have no lock held while we run dna_update_shared_config from dna_update_config_event. Could this be contributing?

In dna_release_range() we take the read lock, but don't we need the write lock when we call dna_update_shared_config() later on as we are altering the config?

in dna_update_shared_config why are we unlocking dna_server_unlock, since we are still in the middle of dna_load_shared_servers, which is the only place where the dna_server_lock() is taken. We should be waiting til we leave dna_update_shared_config and letting load_shared_servers do the unlock.

If anything, given we have the config write lock at that point, do we even need the shared_servers lock? We already can only have one thread in this context, and so the extra lock is superfluous.

I'm still not sure where the dead lock is happening, but I think there are a few locking issues in dna.c that we should address first anyway.

Hi William,

There are several level of configuration.
dna_read_lock protects dna_global_config that is all entries under 'cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config'.

Then for each of those entries, there is a 'dnaSharedCfgDN' that refers to the subtree (usually in the replicated suffix) that will contain the hosts remaining values.
My understanding is that this entry is protected by config_entry->lock.

In fact acquiring dna_write_lock may help to prevent the lock but I think that by design we may have such deadlock situation. I just wonder if we really need dna_config_check_post_op to be a backend post op rather than a simple post_op.

Replying to [comment:6 tbordaz]:

Hi William,

There are several level of configuration.
dna_read_lock protects dna_global_config that is all entries under 'cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config'.

Then for each of those entries, there is a 'dnaSharedCfgDN' that refers to the subtree (usually in the replicated suffix) that will contain the hosts remaining values.
My understanding is that this entry is protected by config_entry->lock.

From what I can see, there is:

dna_(read,write)lock
dna_server
(read,write)_lock
entry->lock

Both dna_ and dna_server locks are globals, so they probably are redundant. My other issues I found with the locking still stand too. The entry->lock I haven't investigated yet. The entry->lock isn't popping up in the deadlock situation yet though.

In fact acquiring dna_write_lock may help to prevent the lock but I think that by design we may have such deadlock situation. I just wonder if we really need dna_config_check_post_op to be a backend post op rather than a simple post_op.

I'm not sure that will help. Even in a backend post op we are going to need to start the txn, which is where the lock issue is "appearing". But I am more suspicious that we should correct our locking behaviour in dna.c above that first, because there are a number of weaknesses in the design above it. From there, we should consider our next move with the post op if the backend txn keeps giving us grief. It just appears the deadlock is due to mishandling of of the dna_(read,write)_lock rather than the backend txn.

My understanding is tht g_dna_cache_lock protects variable dna_global_config.
The thread (dna_release_range) holds that lock while updating a config_entry (on the backend). It is required because the config_entry is coming from the dna_global_config. But it means we have the order g_dna_cache_lock -> backend lock.

The other thread holds the backend lock but acquired in write g_dna_cache_lock because it access dna_global_config.
I am not sure it is necessary to hold in write that lock. It appears to only read the dna_global_config.

Well, if we look:

dna_read_lock,dna_write_lock == g_dna_cache_lock

dna_server_read_lock,dna_server_write_lock == g_dna_cache_server_lock

So with slapi_rwlock, which hands down no nspr locks, the difference between read and write is the priority of who gets the lock next. If you hold the write lock, no one can read: if you hold the read lock, no one else can read etc. It's about who gets it next. (As far as the NSPR documentation goes that is)

So, in all the parts of the code where we take g_dna_cache_server_lock, we already have the g_dna_cache_lock.

There are parts of the code where we prematurely release g_dna_cache_server_lock, meaning we have a double unlock. We only never noticed it because the above statement is true.

There are parts of the code that state we must hold g_dna_cache_lock, yet we have code that calls these functions after releasing the g_dna_cache_lock.

The entry->lock doesn't really matter in our discussion of this deadlock.

My understanding is tht g_dna_cache_lock protects variable dna_global_config.
The thread (dna_release_range) holds that lock while updating a config_entry (on the backend). It is required because the config_entry is coming from the dna_global_config. But it means we have the order g_dna_cache_lock -> backend lock.

The other thread holds the backend lock but acquired in write g_dna_cache_lock because it access dna_global_config.

Given there is code here where we are working in the backend without any lock at all when we require the lock, we prematurely give up locks, and we have a set of locks that don't provide any benefit to our concurrency, maybe we need to just fix the locking situation in dna.c, and then continue to analyse if the problem occurs. If it still occurs, then we need to investigate again.

Until then, we could be trying to create a complex work around for a problem, which is as easily solved by correcting our locking behaviour to be simpler and correct.

Once we do that, why don't we remove this code in dna_update_config_event

{{{
1543 rc = slapi_back_transaction_begin(dna_pb);
...
1564 slapi_back_transaction_commit(dna_pb);
}}}

We are already a BE_TXT plugin, so isn't there a transaction being called around our code anyway? Why do we need a nested transaction?

Additionally, if we corrected the g_dna_cache_lock locking behaviour around this code so we don't prematurely unlock before we call dna_update_shared_config(config_entry), aren't we protected from the other dna threads?

Hi William,

In terms of data structure (my understanding)
g_dna_cache_server_lock protects dna_global_servers
g_dna_cache_lock protects dna_global_config

I agree that if some part of code access those structures without holding the lock, it needs to be fixed.
Now my concern is that by design we may have operations that acquire:
backend lock and then dna plugin lock (g_dna_cache_lock and sometime in addition g_dna_cache_server_lock)
dna plugin lock (g_dna_cache_lock and sometime in addition g_dna_cache_server_lock) and then backend lock

If this is a design requirement, then we must enforce that in such situation g_dna_cache_lock and g_dna_cache_server_lock are not taken in 'write'. Else we will have a deadlock.

But (in thread 4), dna_config_check_post_op (be_txnpostop) calls
dna_load_plugin_config->dna_delete_config that updates dna_global_config so it requires write lock
dna_load_plugin_config->dna_load_shared_servers that read dna_global_config but holding write lock that is IMHO not necessary.

We are in deadlock because of the second point (that could be an unrequired write lock) but we could have been because of the 1rst point as well (where write lock is required).

In conclusion, I have the feeling there is a design issue where either:
dna_config_check_post_op should not be in be_txnpostop
dna_config_check_post_op should not call dna_delete_config

If this is a design requirement, then we must enforce that in such situation g_dna_cache_lock and g_dna_cache_server_lock are not taken in 'write'. Else we will have a deadlock.

According to NSPR the difference between read and write locks here are not you may hold many read locks, and no writes, or one write and no reads. It appears from the documentation to be a way to prioritise who gets the lock next.

I can't believe I didn't see this before. The issue might be:

dna_update_shared_config is called by dna_extend_exop, which if we look at the init function:

{{{
slapi_pblock_set(pb, SLAPI_PLUGIN_EXT_OP_FN,
(void *) dna_extend_exop) != 0) {
}}}

Where as dna_load_shared_servers is called in dna_config_check_post_op which is be_txn. (I feel a bit silly for not noticing this sooner :( )

Because extended exop isn't a betxn, it takes the lock in the reverse order when it needs to write to the backend.

Perhaps the solution here is to make SLAPI_PLUGIN_BE_TXN EXT_OP_FN? That way all our function calls are wrapped correctly in the transaction?

At the same time I would like to suggest we have a close look at the locking in dna.c anyway, because of the issues I noticed. They could lead to future issues in and of themself.

Does this sound like a reasonable hypothesis and course of action?

I agree that they are potential issues if some data structure protected by a lock are accessed without holding the lock. Now it looks more as a bug than a design issue.

I think we need to clarify if the 'dna_config_check_post_op' really need to be a be_postop. The transaction grant an all or nothing update but does not protect against a deadlock. The problem here,IMHO, is that we have the backend lock and a plugin lock and being a be_postop, 'dna_config_check_post_op' holds backend lock before acquiring a plugin lock.

If dna_config_check_post_op needs to be a be_postop, then he should acquire read lock only.

If we are in a be_txn we a backend lock, could we not rely on that to provide the locking conistency with the config instead of the plugin lock?

This still doesn't solve the issue that EXT_OP is not within a TXN, so we still may need to make SLAPI_PLUGIN_BE_TXN_EXT_OP_FN.

Remember, there is no difference between a read or write lock besides the priority in who gets the lock next: for all intents and purpose it's a single mutex, with a priority queue to decide who gets it next.

Hi,

This is a really nice and impressive fix.
Few comments:
* I miss how SLAPI_PLUGIN_EXT_OP_BACKEND_FN is owning the backend lock and how extop SLAPI_PLUGIN_BETXNEXTENDEDOP are called while holding the backend lock
* dna_config_check_post_op is a betxn_postop. It basically reload the config/shared_config. I do not see the benefit of being a betxn_postop, rather than a postop. even if two updates happen in parallel, what we want at the end, it is the config/shared_config resulting from those two updates.

In addition to Thierry's concerns there are also a ton of indentation issues in many of the files.

Here's my question...

The deadlock is between dna_read_lock held in dna_release_range and the dblayer backend transaction (dblayer_lock_backend).

I see you added a new callback dna_extend_exop_backend which is called in the dblayer backend transaction. But the problematic function dna_release_range is still in dna_extend_exop and it's still called as SLAPI_PLUGIN_EXT_OP_FN which is not in the dblayer backend transaction. The new callback dna_extend_exop_backend basically calls dna_parse_exop_ber which does not look the cause of the deadlock...

I don't understand how this change fixes the deadlock...

It looks to me what we really need is just putting the section which holds dna_read_lock in the transaction?
{{{
4438 static int
4439 dna_release_range(char range_dn, PRUint64 lower, PRUint64 upper, pblock / need to be given */)
4440 {
[...]
4449 slapi_log_error(SLAPI_LOG_TRACE, DNA_PLUGIN_SUBSYSTEM,
4450 "--> dna_release_range\n");
4451
4452 if (range_dn) {
4453 range_sdn = slapi_sdn_new_dn_byref(range_dn);
4454
txn_rc = slapi_back_transaction_begin(pblock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4455 dna_read_lock();
[...]
4612 slapi_sdn_free(&range_sdn);
4613
4614 dna_unlock();
txn_rc = slapi_back_transaction_commit(pblock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4615 }
4616
4617 slapi_log_error(SLAPI_LOG_TRACE, DNA_PLUGIN_SUBSYSTEM,
4618 "<-- dna_release_range\n");
}}}

Few comments:
* I miss how SLAPI_PLUGIN_EXT_OP_BACKEND_FN is owning the backend lock and how extop SLAPI_PLUGIN_BETXNEXTENDEDOP are called while holding the backend lock

Okay. If we look in the first trace:

{{{
...
#4 0x0000003c530239ab in PR_EnterMonitor (mon=0x1ff0ce0) at ../../../nspr/pr/src/pthreads/ptsynch.c:529
#5 0x00007fbc563ee811 in dblayer_lock_backend (be=0x1ff2b80)
at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/dblayer.c:3904
#6 0x00007fbc563ede82 in dblayer_txn_begin (be=0x1ff2b80, parent_txn=0x0, txn=0x7fbc34ff62c0)
at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/dblayer.c:3623
...
#10 0x00007fbc59b59fb1 in slapi_modify_internal_pb (pb=0x7fbc080025d0)
at ../workspaces/389-master-branch/ds/ldap/servers/slapd/modify.c:454
...
#14 0x00007fbc568a53f1 in dna_extend_exop (pb=0x7fbc34ff8ae0)
at ../workspaces/389-master-branch/ds/ldap/servers/plugins/dna/dna.c:4311
#15 0x00007fbc59b6f646 in plugin_call_exop_plugins (pb=0x7fbc34ff8ae0, oid=0x7fbc08001660 "2.16.840.1.113730.3.5.10")
at ../workspaces/389-master-branch/ds/ldap/servers/slapd/plugin.c:510
#16 0x00000000004220b7 in do_extended (pb=0x7fbc34ff8ae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/extendop.c:337

}}}

We have the extended operation which calls dna_extend_exop. Some other stuff happens, but eventually we call into slapi_modify_internal. This then starts a transaction, and takes the backend lock here.

So by adding the new plugin type, this will re-arrange the calling to be: (Inverted to match the gdb trace)

{{{
...

slapi_modify_internal_pb

...

dna_extend_exop

plugin_call_exop_plugins

dblayer_txn_begin

do_extended

}}}

  • dna_config_check_post_op is a betxn_postop. It basically reload the config/shared_config. I do not see the benefit of being a betxn_postop, rather than a postop. even if two updates happen in parallel, what we want at the end, it is the config/shared_config resulting from those two updates.

So the very fact that dna_config_check_post_op is why I decided to add a new be_txn_exop plugin type.

Previously we had dna_config_check_post_op in a be_txn so it would take:

  • Backend lock
  • Dna lock(s)

But because exop was NOT in a be_txn, when it went to make a change, it would start a transaction internally, thus:

  • Dna lock(s)
  • Backend lock.

IE deadlock.

So now, this way ALL the entry points to the DNA plugin are now be_txn, including the exop, so all locks are taken in the correct orders.

In addition to Thierry's concerns there are also a ton of indentation issues in many of the files.

Probably. I didn't want to re-tab everything until after the code review was done. It would make it harder to see what was part of the feature, and what wasn't.

The deadlock is between dna_read_lock held in dna_release_range and the dblayer backend transaction (dblayer_lock_backend).

I see you added a new callback dna_extend_exop_backend which is called in the dblayer backend transaction. But the >problematic function dna_release_range is still in dna_extend_exop and it's still called as SLAPI_PLUGIN_EXT_OP_FN which is >not in the dblayer backend transaction. The new callback dna_extend_exop_backend basically calls dna_parse_exop_ber which >does not look the cause of the deadlock...

I don't understand how this change fixes the deadlock...

It looks to me what we really need is just putting the section which holds dna_read_lock in the transaction?

Okay. So besides my above comment to Thierry:

If we look at the way that this works in general terms:

{{{
Start do_extended
--> If the plugin is a exop (IE Not be_txn_exop) call it and return
--> If oid not handled, try out be_txn exop plugins.
--> Get the correct backend to put into a transaction:
----> This calls plugin_extended_op_getbackend( pb, extoid )
-------> Calls dna_extend_exop_backend to work out what backend the operation in pb and extoid will work upon.
--> start the transaction
----> Call plugin_call_exop_plugins(pb, extoid, SLAPI_PLUGIN_BETXNEXTENDEDOP)
-------> Now calls dna_extend_exop(pb, extoid) which actually does the work
--> Decide to commit or abort the txn.
}}}

So dna_extend_exop_backend is a helper to ask the plugin "given this operation and oid, what backend will you work upon". we Need to know the backend to start the txn in extendop.c:

{{{
// This will call dna_extend_exop_backend
Slapi_Backend be = plugin_extended_op_getbackend( pb, extoid );
...
Slapi_PBlock
be_pb = NULL;
be_pb = slapi_pblock_new();
slapi_pblock_set(be_pb, SLAPI_BACKEND, be);

        int txn_rc = slapi_back_transaction_begin(be_pb);

}}}

So dna_extend_exop_backend is part of the fix. It just mean for a plugin to be a betxnextendedop plugin it must in the init register these two functions:

{{{
slapi_pblock_set(pb, SLAPI_PLUGIN_EXT_OP_FN,
(void ) dna_extend_exop) != 0 ||
slapi_pblock_set(pb, SLAPI_PLUGIN_EXT_OP_BACKEND_FN,
(void
) dna_extend_exop_backend) != 0) {
}}}

If the SLAPI_PLUGIN_EXT_OP_BACKEND_FN isn't there, we can't work out what backend we need, then we cannot start a transaction so the plugin call fails.

I hope this explains the solution.

I see. Thanks for the explanation. I understood the dna extop callbacks are all called in the dblayer backend transaction. Now, I'd like to know why the simple solution does not work in this case? The patch is quite large to do (almost) the same thing?

Plus, for instance, dna_parse_exop_ber returns an error :), you don't have to start the transaction...
{{{
4340 static int dna_extend_exop(Slapi_PBlock *pb)
4341 {
[...]
4355 // Check the return code.
4356 dna_parse_exop_ber(pb, &shared_dn);
}}}

The primary benefit of the betxn is all the plugin operations associated with a main op are committed or aborted in one transaction. That way, we can avoid the inconsistency among the results from the plugin ops and main op. It seems there is no such benefit in extended op since there is no main operation which is in the transaction? But only a purpose to introduce the betxn for extop is avoiding the deadlock here...

Okay. We have gone to a lot of effort to have be_txn for other plugins. Plugins should not be managing their own transactions (in my mind) as it can create exactly this scenario of deadlocks.

In the future all plugins should be BE_TXN of some kind.

To achieve this, means we need to support BE_TXN for EXOP plugins. Both for preventing deadlocks (such as this one), but also to make transaction handling something that shouldn't exist in a plugin. In fact, plugins should never know they are in a transaction, or even how to begin / commit / abort them.

Yes, I could have done the simple solution for this one case: Instead I decided to implement this plugin type, which is the harder way, but it's also the right way. We move the transaction handling outside of the plugin. In the future this paves the way for us to convert the remaining exop plugins to BE_TXN_EXOP.

In this case there can be operations that happen, and yes we are avoiding the deadlock. But it also means other plugins can take advantage of this in the future because we implement the correct behaviour.

Additionally, you question why we even need to be in a transaction here: The original deadlock is because the extend_exop is making changes to the database via slapi_modify_internal. If there was no changes, we wouldn't need the transaction, and we wouldn't have this bug. So the transaction is needed, not just to prevent the deadlock, but because the exop is actually making changes to our system.

The other reason for this design, rather than putting the transaction into release range, is because release range can be called from the POST_OP parts of the code (If I recall correctly), so then we would need to track who called us, so we don't open a nested transaction etc.

This may be more code now, but in the long run, it's the cleaner, simpler option.

I hope that this explains why I felt justified in designing and implementing this solution.

Summary: I decided to follow the hard, high road, rather than the quick easy path.

Plus, for instance, dna_parse_exop_ber returns an error :), you don't have to start the transaction...
{{{
4340 static int dna_extend_exop(Slapi_PBlock *pb)
4341 {
[...]
4355 // Check the return code.
4356 dna_parse_exop_ber(pb, &shared_dn);
}}}

At this point, dna_extend_exop_backend has already called dna_parse_exop_ber(pb, &shared_dn); with the same parameters once, so if it failed before, we won't make it here. If we got here, it means this must have previously passed.

However, I think that it's probably wise to check anyway. I'll fix this for a new patch.

Hi William,

Thanks for all these explanations.
I understand the benefit of this new framework. IMHO I think it should rather be a RFE. Then this ticket will be 'blocked by' this RFE. The fix for this ticket would then be much smaller.
I am still not understanding why it is a BETXN_EXTOP. My understanding is that 'dna_extend_exop' (release_range) will be called under a transaction. But at that time how the backend lock itself is acquired (do you mean that slapi_back_transaction_begin/tx_begin is equivalent to dblayer_txn_begin ?). I think it would work because of db_deadlock_detection that would recover from deadlock but backend_lock was not acquired.

Hi william,

sorry for the noise, I finally get it. slapi_back_transaction does acquire the backend log.

Replying to [comment:22 tbordaz]:

Hi William,

Thanks for all these explanations.
* I understand the benefit of this new framework. IMHO I think it should rather be a RFE. Then this ticket will be 'blocked by' this RFE. The fix for this ticket would then be much smaller.

Yes, this has been suggested by email also. I believe that I should split this into:

  • Your testcase
  • An RFE for the BE_TXN_EXOP type
  • The implementation of BE_TXN_EXOP for dna.c

sorry for the noise, I finally get it. slapi_back_transaction does acquire the backend log.

I'm glad! It took me a while to "get it" myself and come up with this solution, but I believe it to be an elegant and correct one.

If we observe other plugins, we have these that are exop:

  • dna
  • replication
  • whoami

Additionally, in the main slapd code:

  • passwd_extop

In the future for https://fedorahosted.org/389/ticket/48707 this will require an exop component also.

From these we have two lists:

Plugins that make changes to the database
dna
password_extop
* sso token (future)

Plugins that do not (Or are special in some way)
replication
whoami

If we observe the code in password_extop:

{{{
passwd_modify_extop( Slapi_PBlock pb )
{
...
{
Slapi_Backend
be = NULL;
be = slapi_mapping_tree_find_backend_for_sdn(slapi_entry_get_sdn(targetEntry));
...
slapi_pblock_set(pb, SLAPI_BACKEND, be);
}
...
// Check the users password is correct etc.
...
ret = passwd_modify_userpassword(pb, targetEntry, newPasswd, req_controls, &resp_controls);
...
}

static int passwd_modify_userpassword(Slapi_PBlock pb_orig, Slapi_Entry targetEntry,
const char newPasswd, LDAPControl req_controls, LDAPControl resp_controls)
{
...
slapi_mods_init (&smods, 0);
slapi_mods_add_string(&smods, LDAP_MOD_REPLACE, SLAPI_USERPWD_ATTR, newPasswd);
ret = passwd_apply_mods(pb_orig, slapi_entry_get_sdn_const(targetEntry),
&smods, req_controls, resp_controls);
...
}

static int
passwd_apply_mods(Slapi_PBlock pb_orig, const Slapi_DN sdn, Slapi_Mods mods,
LDAPControl req_controls, LDAPControl
resp_controls)
{
....
ret =slapi_modify_internal_pb (&pb);
}
}}}

There could be a case for wrapping the actions of password extop into a transaction in the future for example. Consider that there is a race condition between checking the password and modifying the password for example. Thankfully there are no other locks around to cause deadlock here, and the operation is pretty simple, so unlikely to see these issues.

The new ssotoken plugin we plan to make would certainly be able to take advantage of the be_txn_exop type as this will be also making changes to objects in db, and will require correct transnational semantics.

Summary:

From this two actions have come out:

  • Check the error handling on 4356 dna_parse_exop_ber(pb, &shared_dn); in dna.c
  • Split this change out into an RFE.

Hi William,

Rereading the complete thread I have a concern.
I think betxnextendedop brings more flexibility to extop but I wonder if betxnextendedop is not an overkill for this specific ticket.

The issue here is that 'dna_config_check_post_op' being called inside a betxn_postop we acquire plugin lock after having acquired backend lock.
What I do not understand is if 'dna_config_check_post_op' really needs to be called in betxn_postop and could not be called only in postop.

dna_config_check_post_op does two things:
* If the write operation is related to config entries, it rebuilds the dna config structure (dna_global_config) that is a list of config entry and is protected by g_dna_cache_lock
* If the write operation is related to shared config entries, it rebuilds the shared entry structure (dna_global_servers) that is a list of shared config host entries and is protected by g_dna_cache_server_lock

So basically it rebuilds an up to date config dna_global_config/dna_global_servers after an update.
This rebuilding is protected with the associated lock of those structure. So we can be sure, that while the config is updated no other thread can also modify it.

Do you know what would be the benefit of holding the backend lock at that time ?

The issue isn't so much the dna_config_check_post_op() call, but the way the locking is working.

If we have a modify operation, the order of events will be:

be_txn -> dna_pre_op() -> _dna_pre_op_modify() -> dna_read_lock()

And of course, with the exop, we call dna_release_range() which takes the dna_read_lock() also.

This is just one example, but there are many paths through the dna code where we are from the post / pre op plugins within a be_txn, and also taking the read/write lock, and within the be_txn. The dna_server lock is only ever taken within the dna_read/write_lock() anyway.

I think the benefit here is that the dna code is pretty complex, and the locking within it, also quite complex. Given that previously, most code path in the dna plugin takes the be_txn path, we should keep it consistent with the exop. This way we avoid the deadlock, or any other "subtle race/edge/insane" case we missed. Even though we are all excellent engineers, I'm still sure we have missed something even in this discussion. I'd rather avoid that, and get it right by wrapping too much code in the be_txn, rather than too little.

It also raises the issue that we should review the dna plugin and it's locking situation because of these issues that we have witnessed, and the solutions we are needing to come up with for it.

I fully agree that the locking scheme of DNA is complex and using the betxnextop may be a way to address it.
I am not talking of the locking scheme but on this specific deadlock.
The deadlock here appends because dna_config_check_post_op needs write lock (to rebuild config/servers).
This betxn_postop is the only to require write lock (dna_write_lock/dna_server_write_lock), except during DNA startup but at that time there is no operation.

If we can (is it valid ?) make dna_config_check_post_op a simple post_op, it would fix this deadlock.
If there are other scenarios that lead to deadlock, I am fine using betxnextop instead.

There is no difference between a read or write lock: Write locks just have priority in the queue as to who obtains them next. They are otherwise the same.

Unless we can remove dna_release_range() from the exop (Hint: we can't), then we will have this deadlock in some other way.

Because every single operation in dna from the pre / post operations are in a be_txn, and they all take a dna_read/write_lock, this dead lock will just happen again in a different way. Exop will always need to use release_range, and release_range will always take the dna_read/write_lock too, and it will then start a transaction internally.

The solution to make dna_config_check_post_op a simple post op would be valid, but only if you were willing to re-write every other function of this plugin to all be outside of the be_txn (dna_pre_op for example).

Summary: Every single plugin entry point must either be of the BE_TXN type, or NONE of them can be. Trying to mix and match is just a recipe for disaster.

Replying to [comment:28 firstyear]:

Summary: Every single plugin entry point must either be of the BE_TXN type, or NONE of them can be. Trying to mix and match is just a recipe for disaster.

It's too strict, I would say... we need to add some more conditions like...?
"unless it does not have to hold any local locks and it does not update entries in the backend."

Replying to [comment:29 nhosoi]:

Replying to [comment:28 firstyear]:

Summary: Every single plugin entry point must either be of the BE_TXN type, or NONE of them can be. Trying to mix and match is just a recipe for disaster.

It's too strict, I would say... we need to add some more conditions like...?
"unless it does not have to hold any local locks and it does not update entries in the backend."

Rewording of that:

If your code changes or reads entries in the backend AND has any local locks, in any part of the code, it MUST be a BE_TXN plugin to prevent deadlocks.

Saying this, I'd be willing to say:

If your code changes or reads entries in the backend, in any part of the code, it MUST be a BE_TXN plugin to prevent deadlocks and race conditions.

Consider:

{{{
entry_a = slapi_get_entry(...);
// Point X
entry_b = slapi_get_entry(...);
// Do some compare or operation ...
}}}

If this was NOT be_txn, there is now a race condition around point X as each slapi_get_entry will take a unique backend transaction, and a change could occur in the middle. This could lead to inconsistent results, or some other bug report etc ....

Replying to [comment:30 firstyear]:

Rewording of that:

If your code changes or reads entries in the backend AND has any local locks, in any part of the code, it MUST be a BE_TXN plugin to prevent deadlocks.

ack

Probably, your comments in https://fedorahosted.org/389/ticket/48342?replyto=30#comment:30 could be a good candidate to be added to the Plugin Guide? ;)

Replying to [comment:31 nhosoi]:

Replying to [comment:30 firstyear]:

Rewording of that:

If your code changes or reads entries in the backend AND has any local locks, in any part of the code, it MUST be a BE_TXN plugin to prevent deadlocks.

ack

Probably, your comments in https://fedorahosted.org/389/ticket/48342?replyto=30#comment:30 could be a good candidate to be added to the Plugin Guide? ;)

Perhaps, but what part of the guide would this go in?

Last night (Yesterday morning depending on perspective) I had a call with TBordaz and LKrispen.

After some discussion we were contemplating if the removal of dna_config_check_post_op() from being in be_txn would solve this issue.

I have come to the conclusion this is not possible. (I do think I worked this out a long time ago, but I had forgotten due to all the back and forth :) )

In dna_config_check_post_op() we are within a BE_TXN, and we take the dna_write_lock during the call to dna_load_shared_servers(). In dna_load_shared_servers() we are not accessing the backend, so we do not required the BE_TXN: Correctly, this would not be needed to be in BE_TXN if dna_load_shared_servers() was all we did.

However, the code for dna_config_check_post_op() contains:

{{{
if ((dn = dna_get_dn(pb))) {
if (dna_dn_is_config(dn)) {
dna_load_plugin_config(pb, 0);
}
if(dna_dn_is_shared_config(pb, dn)){
dna_load_shared_servers();
}
}
}}}

Within dna_load_plugin_config() we take the following actions:

{{{
dna_write_lock();

...
slapi_search_internal_set_pb(search_pb, getPluginDN(),
LDAP_SCOPE_SUBTREE, "objectclass=*",
NULL, 0, NULL, NULL, getPluginID(), 0);
slapi_search_internal_pb(search_pb);
...
dna_unlock();
}}}

If this is changed to be outside of the BE_TXN, the call to slapi_search_internal_pb() will start a txn, and we will have a new deadlock to potentially face.

{{{
slapi_search_internal_pb() -...-> op_shared_search() -...-> ldbm_back_search()

...
if ( !txn.back_txn_txn ) {
dblayer_txn_init( li, &txn );
slapi_pblock_set( pb, SLAPI_TXN, txn.back_txn_txn );
}
...
}}}

As a result, this is why we cannot move dna_config_check_post_op() outside of the BE_TXN.

Hi William,

dna_load_shared_servers is doing internal search (dna_get_shared_servers) on the main backend (domain).
dna_load_plugin_config is doing internal search (dna_load_plugin_config) on the cn=config backend.

dna_config_check_post_op being a betxn plugin, both internal searches will inherit the parent txn but will not start a new transaction (call dblayer_txn_begin).
Now if dna_config_check_post_op is outside of BETXN_POSTOP, the internal searches will be without txn.

If we move dna_config_check_post_op outside of BETXN_POSTOP, an other update may impact the config/servers(that will also call dna_config_check_post_op).

If the shared config host is updated multiple times, the final result will be reflected in config/servers list by the last POSTOP.

If the dna config is updated dna_load_plugin_config->dna_update_config_event deletes/creates the shared config host. If it is a simple POSTOP, the order is not enforced and we may end into a recreated shared config host that does not reflect the order of the updates.

So I think there is a possible issue to move dna_load_plugin_config outside of BETXN, because it can recreate a shared config host that does not reflect the order of the config updates.

well, dna_load_plugin_config probably doesn't need to be in the txn. I think this area of the code needs some rethinking.

However, dna_load_plugin_config is working on cn=config, and that backend lacks transactions from my reading of the code (I could be wrong here though).

So in theory we could be talking a transaction for exop which isn't needed because we delve into the dna_load_plugin_config code which goes to be cn=config.

But for the conditions of the exop, I think that we will only branch to dna_load_shared_servers(), and the branch for dna_load_plugin_config() is triggered elsewhere.

So I think it's not an issue, but it's a bit subtle.

What do you think?

Hi William,

an important step of dna_load_plugin_config is that it calls dna_update_config_event (immediate call when coming from dna_config_check_post_op).
In dna_update_config_event, it recreates shared config entry (for its localhost): i.e. "dnaHostname=localhost.localdomain+dnaPortNum=41389,ou=ranges,dc=example,dc=com"

The shared config entry (on the main backend) reflects config param (port/host/bind method.. and also remaining range).
My concern if dna_update_config_event is called in POSTOP is that several config updates apply in parallel and the attributes of the resulting shared config entry will not follow the order of the config updates.

In this case it's updating the real backend, which means it's protected in the BE_TXN.

Which means that you can't have the post op occurring in parallel: exactly the issue that this fix is trying to solve.

Unless there is a different entry point to the plugin that is outside the be_txn, then this would be an issues.

The current deadlock can occur in dna_load_shared_servers or in dna_load_plugin_config (both are trying to acquire config lock in write).

Moving dna_load_shared_servers to postop is not sufficient as dna_load_plugin_config can also deadlock.
But moving dna_load_shared_servers and dna_load_plugin_config to postop is not possible because of https://fedorahosted.org/389/ticket/48342#comment:37

So dna_config_check_post_op (wrapper of dna_load_shared_servers and dna_load_plugin_config) needs to be a betxn_postop. The order of locks backend_lock then dna_lock can not be changed.

The only option to prevent this deadlock is to prevent dna_range_release (extop) to acquire locks in the order dna_lock then backend_lock, making sure backend_lock is acquired before dna_lock. Your proposal to wrap dna_range_release into a betxn_extop will fix this.

Excellent! I'm really glad that we had this (long) discussion. I feel we have come to the same conclusion, and are happy enough with the code. I've certainly learnt a lot from it!

I'll leave it up to you if you are happy to give the ack. I think that because this is a delicate area, we should be sure to make it a QE / testing priority if we still have concerns.

Thanks!

Last patch looks good.

I have a minor question regarding change in dna_update_config_event.
The only condition rc!=0 is that slapi_back_transaction_begin fails.
I am not sure what will be the impact of aborting a not started transaction. I will likely abort the parent txn that was not what was done before.

Actually, I think there is more of a problem around that code now I take a second look at it:

{{{
if (be) {
dna_pb = slapi_pblock_new();
slapi_pblock_set(dna_pb, SLAPI_BACKEND, be);
/ We need to start transaction to avoid the deadlock /
rc = slapi_back_transaction_begin(dna_pb);
if (rc) {
slapi_log_error(SLAPI_LOG_FATAL, DNA_PLUGIN_SUBSYSTEM,
"dna_update_config_event: failed to start transaction\n");
}

}
...
slapi_delete_internal_set_pb(pb, config_entry->shared_cfg_dn,
NULL, NULL, getPluginID(), 0);

}}}

So even if the backend transaction fails, we go ahead with the delete anyway!

I'll fix this area of code up into a new patch.

This is a supplmentary patch to correct issues with transactions in config update event.
0001-Ticket-48342-Prevent-transaction-abort-if-a-transact.patch

Both patches looks good

0004-Ticket-48342-DNA-deadlock-during-DNA_EXTEND_EXOP_REQ.patch and 0001-Ticket-48342-Prevent-transaction-abort-if-a-transact.patch

ACK

(update dependency on https://fedorahosted.org/389/ticket/48769)

commit a970b176183852f38a3b804abee007b26dbb983c
commit 0a35a6d2c33aba052b9df48ab55b4d8e0c17a1c0
commit c00405f3d86ad11d27b4d20c0e0cfa4e38c13376

Total 23 (delta 14), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
d01e436..472a96b master -> master

Thank you all for your patience and diligence with this ticket!

Metadata Update from @tbordaz:
- Issue assigned to firstyear
- Issue set to the milestone: 1.3.5.0

2 years ago

Login to comment on this ticket.

Metadata