Issue #48118: At startup, changelog can be erronously rebuilt after a normal shutdown - 389-ds-base

389-ds-base

#48118 At startup, changelog can be erronously rebuilt after a normal shutdown

Closed: wontfix 5 years ago Opened 9 years ago by tbordaz.

At startup, a supplier compares the RUV that are in the Database and in the changelog.

With the ticket https://fedorahosted.org/389/ticket/564, the changelog_ruv is in sync with the database updates while the database_ruv is resynched periodically (30s) and so database_ruv can be less than changelog_ruv.

After disordely shutdown, at startup if the database_ruv < changelog_ruv it is assumes that the resynch thread had not the time to resynch the database_ruv and this one is rebuilt from the changelog_ruv.

After normal shutdown, at startup if the database_ruv < changelog_ruv it is detected as an error condition and the changelog is rebuilt.
The problem is that there is no mechanism to force the database_ruv resynch at normal shutdown.

If the normal shutdown occurs less than 30 sec after the last update, there is a risk that the database_ruv has not be resynched and so that the changelog is erronously rebuilt.

This problem has been reproduced several times. Apparently it occurs more frequently on slow VM.

tbordaz commented 9 years ago

This bug exist in 1.3.2 and after. Milestone = 1.3.2 ?

tbordaz commented 9 years ago

Going further in investigation I was wrong saying that
''The problem is that there is no mechanism to force the database_ruv resynch at normal shutdown.''

In fact at normal shutdown, 'multimatster-stop' calls 'replica_destroy_name_hash' that updates the csngenerator and the RUV of each replica.

I think that there is a possible corner case in 'replica_update_state', if 'replica.state_update_inprogress', it gets out without flushing the csngenerator/RUV.

Will investigate in what condition 'replica.state_update_inprogress' is set.

mreynolds commented 8 years ago

Having a hard time reproducing this issue. I've setup multiple backends with replication, put the server under load on multiple backends and repeatedly restarted the server. Is there anything else I should be trying? How did you repropduce this Thierry?

nhosoi commented 8 years ago

Since it is hard to reproduce the issue, shall we push it to 1.3.5?

mreynolds commented 8 years ago

Replying to [comment:11 nhosoi]:

Since it is hard to reproduce the issue, shall we push it to 1.3.5?

I think it should be pushed back. It's hard to reproduce and I'm not aware of anyone hitting this issue. That being said, I think I heard Thierry recently say that a cleanallruv task might need to be running at the same time as the shutdown. So I'd like to quick test that on Monday, and if I still can't reproduce it I'll move it to 1.3.5

tbordaz commented 8 years ago

I have not been able to reproduce this problem. It was detected couple of time by QE.

This current ticket requires normal shutdown. So that after database_ruv have been write back DB but for some reason database_ruv < CL_ruv => recreate CL

Before 48208, I was able to reproduce a disordely shutdown if a cleanallruv task was hanging during a shutdown. Regarding 48118, startup after disordely shutdown worked fine, because if database_ruv < changelog_ruv the database_ruv was rebuilt from CL_ruv.
48208 allows normal shutdown with cleanallruv. So may be it could help to reproduce that bug

nhosoi commented 8 years ago

Per triage:
Couldn't reproduce, and happens very rarely only in QEs environment.
Push to 1.3.6 and raise priority if QE starts seeing it more.

tbordaz commented 7 years ago

QE is able to reproduce the changelog recreation quite easily
reopening the ticket
The Root cause is identified
as consequence of https://fedorahosted.org/389/ticket/564, DB RUV is no long in the
same txn as CL RUV.
At regular shutdown there is a race condition between shutdown thread that write back the ruv
and the last operation. The written back RUV may not contain the operation that is still going
going on.
possible fixes discussed in BZ

Metadata Update from @nhosoi:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.6 backlog

7 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus reset (from needinfo)
- Issue close_status updated to: None
- Issue set to the milestone: 1.3.7 backlog (was: 1.3.6 backlog)

7 years ago

389-ds-base

Source Code

#48118 At startup, changelog can be erronously rebuilt after a normal shutdown Closed: wontfix 5 years ago Opened 9 years ago by tbordaz.

Metadata

Attachments 8

#48118 At startup, changelog can be erronously rebuilt after a normal shutdown

Closed: wontfix 5 years ago Opened 9 years ago by tbordaz.