#568 using transaction batchval violates durability
Closed: wontfix None Opened 8 years ago by lkrispen.

Investigations show that setting nsslapd-db-transaction-batch-val does improve performance of concurrent modify operations, but its use violates the durability of the transaction, since the flush of the txn log is delayed a few operations or to when the log_flush_thread runs.
To make use of this feature the synchronization between the worker threads and the flush thread needs to be improved.


just attached the current version I am using for tests

Hi Ludwig,

The code is looking good to me. I just wonder if in log_flush_threadmain, last_flush being not initialized we may not enforce the interval_flush.

I have question regarding why not separating transaction commit and flush. Here basically the working thread commits the txn, waits for txn flush (or flush immediately), then release the backend, then return the operation result. Couldn't we switch the steps 2 and 3. Once the txn is committed, then release the backend. Here we still have durability because the operation result is not returned. Then we flush (with or without batch), then return the result. The gain would be to release the backend before doing the IO of the txn log.

best regards
thierry

yes, last_flush should be initialize, but I think it could only delay the first flush until one of the other conditions is met.

I have left the order of txn_commit, flushing and, dblayer_unlock_backend in this patch, but with ticket 47358, a switch to reverse the order will be provided. I did mention in the description of the patch that there could be a higher benefit if the flushing was moved just before the send_result, but have not tested this.

I did test this patch in conjunction with 47358 and did see a deadlock, I had run a previous version of this fix in a large number of performance tests, maybe I missed somethin when porting, will have to investigate

Looks good to me. Ack'ed.

So, you found an answer to this question? ;)

4475      /* LK this is only needed if online change of 
4476       * of txn config is supported ???  
4477       */

Replying to [comment:8 lkrispen]:

I did test this patch in conjunction with 47358 and did see a deadlock, I had run a previous version of this fix in a large number of performance tests, maybe I missed somethin when porting, will have to investigate

The reason was that in dblayer_txn_abort_ext the txn in progress was always decremented independent of use lock. I didn't see this in my previous test as they also had begun teh txn with DB_TXN_WAIT and so there were almost no retries.

checking in original fix:

Updating b4a4ef5..37c531d
Fast-forward
ldap/servers/slapd/back-ldbm/dblayer.c | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------
ldap/servers/slapd/back-ldbm/ldbm_config.c | 2 ++
ldap/servers/slapd/back-ldbm/ldbm_config.h | 2 ++
ldap/servers/slapd/back-ldbm/proto-back-ldbm.h | 4 ++++
4 files changed, 171 insertions(+), 27 deletions(-)
$ git push origin master
Counting objects: 19, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (10/10), 3.83 KiB, done.
Total 10 (delta 8), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
b4a4ef5..37c531d master -> master

Added a patch for the hang, and committed:
Updating 37c531d..e2a5faf
Fast-forward
ldap/servers/slapd/back-ldbm/dblayer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
$ git push origin master
Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 632 bytes, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
37c531d..e2a5faf master -> master

Metadata Update from @lkrispen:
- Issue assigned to lkrispen
- Issue set to the milestone: 1.3.2 - 05/13 (May)

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/568

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

a year ago

Login to comment on this ticket.

Metadata