#48118 At startup, changelog can be erronously rebuilt after a normal shutdown
Closed: wontfix 5 years ago Opened 9 years ago by tbordaz.

At startup, a supplier compares the RUV that are in the Database and in the changelog.

With the ticket https://fedorahosted.org/389/ticket/564, the changelog_ruv is in sync with the database updates while the database_ruv is resynched periodically (30s) and so database_ruv can be less than changelog_ruv.

After disordely shutdown, at startup if the database_ruv < changelog_ruv it is assumes that the resynch thread had not the time to resynch the database_ruv and this one is rebuilt from the changelog_ruv.

After normal shutdown, at startup if the database_ruv < changelog_ruv it is detected as an error condition and the changelog is rebuilt.
The problem is that there is no mechanism to force the database_ruv resynch at normal shutdown.

If the normal shutdown occurs less than 30 sec after the last update, there is a risk that the database_ruv has not be resynched and so that the changelog is erronously rebuilt.

This problem has been reproduced several times. Apparently it occurs more frequently on slow VM.


This bug exist in 1.3.2 and after. Milestone = 1.3.2 ?

Going further in investigation I was wrong saying that
''The problem is that there is no mechanism to force the database_ruv resynch at normal shutdown.''

In fact at normal shutdown, 'multimatster-stop' calls 'replica_destroy_name_hash' that updates the csngenerator and the RUV of each replica.

I think that there is a possible corner case in 'replica_update_state', if 'replica.state_update_inprogress', it gets out without flushing the csngenerator/RUV.

Will investigate in what condition 'replica.state_update_inprogress' is set.

Having a hard time reproducing this issue. I've setup multiple backends with replication, put the server under load on multiple backends and repeatedly restarted the server. Is there anything else I should be trying? How did you repropduce this Thierry?

Since it is hard to reproduce the issue, shall we push it to 1.3.5?

Replying to [comment:11 nhosoi]:

Since it is hard to reproduce the issue, shall we push it to 1.3.5?

I think it should be pushed back. It's hard to reproduce and I'm not aware of anyone hitting this issue. That being said, I think I heard Thierry recently say that a cleanallruv task might need to be running at the same time as the shutdown. So I'd like to quick test that on Monday, and if I still can't reproduce it I'll move it to 1.3.5

I have not been able to reproduce this problem. It was detected couple of time by QE.

This current ticket requires normal shutdown. So that after database_ruv have been write back DB but for some reason database_ruv < CL_ruv => recreate CL

Before 48208, I was able to reproduce a disordely shutdown if a cleanallruv task was hanging during a shutdown. Regarding 48118, startup after disordely shutdown worked fine, because if database_ruv < changelog_ruv the database_ruv was rebuilt from CL_ruv.
48208 allows normal shutdown with cleanallruv. So may be it could help to reproduce that bug

Per triage:
Couldn't reproduce, and happens very rarely only in QEs environment.
Push to 1.3.6 and raise priority if QE starts seeing it more.

  • QE is able to reproduce the changelog recreation quite easily
    reopening the ticket

  • The Root cause is identified
    as consequence of https://fedorahosted.org/389/ticket/564, DB RUV is no long in the
    same txn as CL RUV.
    At regular shutdown there is a race condition between shutdown thread that write back the ruv
    and the last operation. The written back RUV may not contain the operation that is still going
    going on.

  • possible fixes discussed in BZ

Metadata Update from @nhosoi:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.6 backlog

7 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus reset (from needinfo)
- Issue close_status updated to: None
- Issue set to the milestone: 1.3.7 backlog (was: 1.3.6 backlog)

6 years ago

I just reproduced this issue. It happens if you have an active operation, triggering internal mods and stop the server before the operation is completed.

In fact the RUV in the database is the correct one, but the changelog maxcsn is ahead

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to None

6 years ago

After some investigation I think I understand what happens, but we have more than one problem and they are different in verions <1.3.5 and versions >= 1.3.5.

There are three different problems:
1] the database ruv is not written at shutdown (affects all versions)
2] the database ruv and the database can be inconsistent: fixed since patches for #49008 and #49287
3] the changelog max ruv can be ahead of the database ruv and the changelog is rebuilt, this can be caused by 1] or by the patches for 2] since they prevent to write a not committed database ruv, but still the cl max ruv is written.

problem 1] is caused by a race condition in setting the ruv dirty and resetting it. The database ruv is written periodically by the housekeeping thread and at shutdown. To prevent unnecessary writes of the ruv there is a "dirty" flag which is set when the ruv is updated and reset when it was written. I added debug logging and we can see the following scenario:

 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 59f1965d000300010000
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_update_ruv: setting repl_ruv_dirty
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_update_state: called for 245ec609-ba2011e7-9660c851-119f29f6
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_update_state:  writing ruv 245ec609-ba2011e7-9660c851-119f29f6
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_write_ruv: {replicageneration} 59f1901c000000020000
[26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_write_ruv: {replica 1 ldap://localhost:39001} 59f19025000100010000 59f1965d000300010000 59f1965d
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_write_ruv: {replica 2 ldap://localhost:39002} 59f19039000000020000 59f19039000200020000 00000000
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 59f1965e000000010000
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_update_ruv: setting repl_ruv_dirty
 [26/Oct/2017:10:01:33 +0200] NSMMReplicationPlugin - replica_write_ruv: reset repl_ruv_dirty, success

while the ruv is written, it is updated and when replica_write_ruv is completed it resets the dirty flag. So if there is no more change the ruv stays clean and will not be written next time - either by the house keeping thread or at shutdown. The following log shows the scenario at shutdown:

 [26/Oct/2017:10:59:25 +0200] NSMMReplicationPlugin - replica_update_ruv: setting repl_ruv_dirty
 [26/Oct/2017:10:59:25 +0200] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 59f1a3ee000300010000
 [26/Oct/2017:10:59:25 +0200] NSMMReplicationPlugin - replica_update_ruv: setting repl_ruv_dirty
 [26/Oct/2017:10:59:26 +0200] - slapd shutting down - signaling operation threads - op stack size 0 max work q size 1 max work q stack size 1
 [26/Oct/2017:10:59:26 +0200] - slapd shutting down - waiting for 1 thread to terminate
 [26/Oct/2017:10:59:26 +0200] NSMMReplicationPlugin - ruv_update_ruv: successfully committed csn 59f1a3ee000400010000
 [26/Oct/2017:10:59:26 +0200] NSMMReplicationPlugin - replica_update_ruv: setting repl_ruv_dirty
 [26/Oct/2017:10:59:26 +0200] NSMMReplicationPlugin - replica_write_ruv: reset repl_ruv_dirty, success
 [26/Oct/2017:10:59:26 +0200] - slapd shutting down - closing down internal subsystems and plugins
 [26/Oct/2017:10:59:26 +0200] NSMMReplicationPlugin - replica_update_state: called for 245ec609-ba2011e7-9660c851-119f29f6
 [26/Oct/2017:10:59:26 +0200] NSMMReplicationPlugin - replica_update_state:  writing ruv 245ec609-ba2011e7-9660c851-119f29f6
 [26/Oct/2017:10:59:26 +0200] - slapd shutting down - freed 1 work q stack objects - freed 1 op stack objects

Possible solutions for 1]
1.1] get rid of dirty flag, always write ruv. a single db update every 30 sec shouldn't hurt even if the data has not changed
1.2] protect the dirty flag by some locking, not nice
1.3] be more smart, keep a copy of the ruv written before, if it has changed write it

problem 2] is understood and solved, the pending list handling now prevents writing ruvs containing uncommitted csns, it is just not in older versions

problem 3] we prevent the update of the database ruv by maintaing pending lists and only after commit of the primary op update the ruv, The changelog max ruv is always updated and writen at shutdown, the changelog entries are removed if the txn is aborted, but the max ruv survives. So after restart it is ahead of the dtabase ruv and ahead of the real csns contained in the changelog.
so, in the current implementation we see that there was an orderly shutdown. trust the changelog max ruv and compare it to the database ruv and if they differ throw away the changelog

Possible solutions for 3]
3.1] do something similar as pending lists for the changelog max ruv
3.2] get rid of changelog max ruv, it is used to check consistency of db and changelog, in case of crash we rebuild it from the cl itself, could be done always, but would be time consuming
3.3] either at shutdown, check if the maxruv is conistent with the cl csns or at startup if cl maxruv and database ruv do an extra check by rebuilding the maxruv from the cl itself

0001-ticket-48118-part-1-ensure-ruv-is-written.patch

for problem 1] we already use the replica lock, but release it before doing the actual write to the database and take it again to unset the dirty flag. The attache patch resets the flag immediately and only if the write fails sets it again

@lkrispen absolutely great description. Thanks !

  • problem 1: I would make it simple, choosing the option 1.1 (systematically write the db ruv)

  • problem 3

  • option 1: looks too complex to me
  • option 2: the main drawback is the delay to rebuild it and allow replication to startup. If changelog is not trimmed it will be a long delay
  • option 3: it looks nice but have the same drawback of option2 at shutdown. If it takes minutes to shutdown, there is a risk that for example systemctl kill the process
  • option 4: keep the code as it is now. A small change: at shutdown, check that each csn in the changelog maxruv are actually present in the changelog. If one is missing, discard the changelog maxruv that will be rebuilt at startup (corner case of last update txn being aborted while plugin may have done some own updates.

I will review the patch

The patch looks good to me. As I said above I would prefer to get rid of the repl_ruv_dirty flag and let replica_write_ruv systematically write the ruv

thanks for your feedback, your option 3.4] looks good to me, will try it.

for the dirty flag, I was in favour for 1.1] as well, but then it looked easy to get it done with the existing locks, will think about it again

With option 1.1 could this be part of the ldbm checkpoint thread? We do these operations on a schedule so it could make sense to have this in a similar housekeeping location.

Otherwise I haven't looked into this too deeply, but I think your discussions look good :)

0001-Ticket-48118-part-1-aternative-always-write-ruv.patch

and here is the patch for the variant 1.1, I tend to favour this now as well. The overhead in database writing is very small, bdb detects that 0 bytes changed and will just update the page LSN and write it in the txn log.
The real overhead is that we need to spawn an internal modify operation, with all drawbacks, but this is a different story. one mod every 30 sec shoul not hurt.

The both patches (1.1 and 3) looks good to me. Ack.

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: None)

6 years ago

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to None (was: ack)

6 years ago

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to review (was: None)

6 years ago

Merged patch is good (you may add declaration of _cl5CheckCSNinCL like it is done for _cl5CheckMaxRUV). ACK

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

I'll nitpick the use of "int" in a new function signature (should be int32_t or int64_t (which is better, it's faster)) otherwise ack.

f49e4cf..5878619 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

Looks like there was an issue with these patches:

[root@localhost BUILD]# make install > /dev/null
../389-ds-base/ldap/servers/plugins/replication/cl5_api.c: In function ‘_cl5CheckMaxRUV’:
../389-ds-base/ldap/servers/plugins/replication/cl5_api.c:2747:41: warning: passing argument 2 of ‘ruv_enumerate_elements’ from incompatible pointer type [-Wincompatible-pointer-types]
     rc = ruv_enumerate_elements(maxruv, _cl5CheckCSNinCL, (void *)file);
                                         ^~~~~~~~~~~~~~~~
In file included from ../389-ds-base/ldap/servers/plugins/replication/repl5.h:29:0,
                 from ../389-ds-base/ldap/servers/plugins/replication/cl5_api.h:19,
                 from ../389-ds-base/ldap/servers/plugins/replication/cl5.h:19,
                 from ../389-ds-base/ldap/servers/plugins/replication/cl5_api.c:30:
../389-ds-base/ldap/servers/plugins/replication/repl5_ruv.h:102:5: note: expected ‘FNEnumRUV {aka int (*)(const struct ruv_enum_data *, void *)}’ but argument is of type ‘int64_t (*)(const ruv_enum_data *, void *) {aka long int (*)(const struct ruv_enum_data *, void *)}’
 int ruv_enumerate_elements(const RUV *ruv, FNEnumRUV fn, void *arg);
     ^~~~~~~~~~~~~~~~~~~~~~

arrgh, before commit I followed Williams advice to use int64_t instead of int.
but I think I built it before actual commit

4e8bf06..bdcbf5b master -> master

584264a..913bc29 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

4a8a896..693abeb 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @amsharma:
- Custom field reviewstatus adjusted to review (was: ack)

6 years ago

you don't need to add these large ldif files, the full users with many attrs like certs are overkill.
In the test suite you should just generate 1000 users like
cn=user1...
objectclass: person
sn:

and the group can also be generated

@amsharma There is a dbgen tool in lib389 that can create this for you.
:)

sorry, but this is not the test we talked about.

you have to shutdown the server while the add of the group is in progress, this will abandon the op internally. But the problem is in writing th eincorrect maxruv at shutdown and recrating the changelog after restart.
so tehre are two key elements:
- shutdown the server in the middle of the group add
- restart the server and chack everything is fine

Just a remark, I wonder if the following testcase would work and avoid the dynamic part of shutdown in the middle of ADD.

  • configure memberof to automatically add an objectclass that does NOT allow 'memberof': memberof.set_autoaddoc('printerServiceAuxClass')
  • provision user entries with an objectclass allowing 'memberof': U_correct[1..N]
  • create a user entry without the appropriate objectclass : U_incorrect
  • as a last operation (before restart), add a group that contains U_correct* and U_incorrect.
    The operation will fail because U_incorrect can not be fixup.

New patch with the fix ->

0001-Ticket-48118-Add-CI-test-case.patch

Note with this test case I get the expected error -
[09/Jan/2018:05:43:53.187946191 -0500] - ERR - NSMMReplicationPlugin - changelog program - _cl5WriteRUV - changelog maxRUV not found in changelog for file /var/lib/dirsrv/slapd-master1/changelogdb/e4d94d04-f52911e7-94eda105-de40770f_5a549cc2000000010000.db
[09/Jan/2018:05:43:53.194032108 -0500] - INFO - dblayer_pre_close - Waiting for 4 database threads to stop
[09/Jan/2018:05:43:54.054498113 -0500] - INFO - dblayer_pre_close - All database threads now stopped
[09/Jan/2018:05:43:54.064796217 -0500] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed
[09/Jan/2018:05:43:54.065549785 -0500] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 1 work q stack objects - freed 2 op stack objects

@lkrispen @spichugi Please review the attached patch. Thanks.

I've fixed an indentation for the docstrings (there were some tabs). I've added one blank line in the end after test function. And I've moved 'add_users' function in the beginning because we have them separate (helper functions and test function).

00d04b0
Author: Amita Sharma amsharma@redhat.com
Date: Tue Jan 9 16:13:42 2018 +0530

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1449

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.