#49463 After cleanALLruv, replication is looping on keep alive DEL
Closed: wontfix 5 years ago by tbordaz. Opened 6 years ago by tbordaz.

Issue Description

When cleanAllRuv is launched, it spawn cleanAllRuv on all replicas.
Each replica will clean its changelog and database RUV but in addition will DEL the keep alive entry of the target ReplicaID.
So for the same entry (keep alive) there will be as many DEL as there are replicas

Those DEL are replicated. The DEL will attempt to DEL an entry that is already a tombstone.
Locally this update is considered as a NOOP.

The problem is that the NOOP does not update the RUV. So the supplier will loop sending the DEL.

Package Version and Platform

This problem is a side effect of https://fedorahosted.org/389/ticket/48278
It exists since 1.3.6

Steps to reproduce

Run the attached test case

Actual results

DEL of the cleaned RID can be replicated several times

Expected results

Either the DEL should not be replicated, or it should update the RUV


Metadata Update from @tbordaz:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

Note that the problem is not fatal for replication but can create long delay to replicate.
It is not fatal because as soon as a new direct update happen after the DEL, the new direct update will be replicated and the DEL will not be replayed.

Long delay of replication. If we have one replica that does not receive direct update, the DEL of the keep alive will remain its maxcsn.
So it will systematically start its replication session from this maxcsn.

In conclusion, the problem may create delay to replicate

Some possibility to fix it:

  • flag the keep alive DEL as a OP_FLAG_REPL_FIXUP, so that it will not go into the CL/RUV
  • Allows replicated NOOP to update the RUV
  • Do not delete keep alive entry in cleanAllRuv
  • others... ?

I think that perhaps the second is a better idea. Because in general if we have a non-keep alive that's deleted twice we'll hit this too. Batch deletes could be worse.

So I think we shuold have the replicated DEL NOOP able to write to the RUV, because we really have processed that CL and up to that CSN, we should not keep replaying it.

To put a different way, if this was another operation type, and we needed to NOOP, would we let it replay over and over? I think we would let it write the ruv.

I think that option one is a "duct tape" solution for a problem that could occur in other ways, which is why I suggest we fix it properly above. However, if we did point 1 and 2 I would also be happy with this solution ....

Finally, I think we should delete the keep alive, because we are saying "the master is gone, lets rcemove it". So deleting the keep alive is correct.

Hope that helps!

@firstyear thanks for you comments.

Allowing NOOP to update the RUV looks the most generic solution. I think it will work but it is more difficult to evaluate the impacts.

The two others solutions are more specific to cleanAllRuv.
Also I think we can also implement both.
I do not like the idea of spreading DEL. If we have 60 replica, it will end into replicating 60 different DEL to all of the replicas. A possibility is to do the DEL immediately on the host receiving the cleanAllRuv and schedule (slapi-eq-once) the DEL on the others. Giving the time for the keep alive to be a tombstone.
In addition to this we can use fixup flag so that we are sure the DEL is replicated by cleanALLRuv but not by regular replication.

The problem is not specific to DEL and not to cleanallruv. The other scenario I know of is after total init. There are cases where the keepalive entry is contained inthe total update, but it is sent again after the init is complete. Also there can be MODs to entries before they were sent, the mods will also be replayed after the init.
And maybe there are other cases as well. I think that updating the RUV for already seen updates is the right thing to do, we have to be sure to handle all scenarios without unexpected side effect. And we need to be aware that it can affect all types of operations and it affects the flow of mod operations considerably.

About the other solutions:

the keep alive del triggered by cleanallruv and inside the mmr plugin, so the FIXUP flag could be a solution to prevent it from being replicated. Similar to prevent replicting delete of tombstones.

I agree that it should be removed since it is an artefact for a removed master. One option could be to delete it only on the server where the cleanallruv task is started. I think we can distinguish between a cleanallruv started directly or received as extop.

@tbordaz Just read your comment after sending mine.

I think you have a good point, we should avoid to have many dels, even if we could manage the repeated repl problem by a NOOp mechanism. (we should investigate this independently)

So your idea is interesting. If receiving and starting a cleanallruv, delete the keepalive immediately. it will gwt a csn with a rid which will not be removed and will be replicated.
As an extra check we could avoid this in cleanallruv triggered by an extop

No conclusion yet for the final fix but there are multiple possibilities to fix it.

Regarding workaround

A simple workaround is to do, after a cleanAllRuv (or a series of cleanAllRuv), a dummy update on each replica. The dummy update (not filtered by fractional) will have a csn larger than the csn of keep alive DEL, so next replication session will start from this dummy update.

The problem is not specific to DEL and not to cleanallruv. The other scenario I know of is after total init. There are cases where the keepalive entry is contained inthe total update, but it is sent again after the init is complete. Also there can be MODs to entries before they were sent, the mods will also be replayed after the init.
And maybe there are other cases as well. I think that updating the RUV for already seen updates is the right thing to do, we have to be sure to handle all scenarios without unexpected side effect. And we need to be aware that it can affect all types of operations and it affects the flow of mod operations considerably.

Yes, I completely agree with this.

About the other solutions:
the keep alive del triggered by cleanallruv and inside the mmr plugin, so the FIXUP flag could be a solution to prevent it from being replicated. Similar to prevent replicting delete of tombstones.
I agree that it should be removed since it is an artefact for a removed master. One option could be to delete it only on the server where the cleanallruv task is started. I think we can distinguish between a cleanallruv started directly or received as extop.

I think this is also a good suggestion.

Can I suggest that the fix be:

  • Allow no-op (ie double del) to update ruv
  • cleanallruv only deletes keep alive from master the task started on: replicated (extop receivers) do not alter the keep alive entry

I think this is a nice, clean and complete solution to the situation. :)

Can I suggest that the fix be:

Allow no-op (ie double del) to update ruv
cleanallruv only deletes keep alive from master the task started on: replicated (extop receivers) do not alter the keep alive entry

I think this is a nice, clean and complete solution to the situation. :)

I agree with this, but I would split it.
- Hanlde the RUV update for NOOPs ina sepatate ticket, it needs extra consideration and has to cover non-allruv scenarios.
- implement one of the other suggestions to get immediate relief in the cleanallruv case. I would like to avoid multiple deletes, so if it is possible to delete it only on teh task origin I would prefer this

@lkrispen I agree with your proposal to split the issue.
Updating ruv on NOOP looks an approved idea but its impact are going further than cleanAllRuv.
It is a generic fix and needs more tests

We can keep this ticket to limited number of DEL on cleanAllRuv

Now I have two concern:

  • Having a single DEL will not fix the current issue (we do need the update RUV on NOOP)
  • detecting if cleanAllRuv is local or propagated may be difficult. For example, cleanallruv task is recorded in the replica (cn=config) entry until its completion. if the server is restarted before it completes, the task is restarted but then we need something (flag) to know if it was local or propagated task.

Sounds great! Can you open the NOOP ticket so we don't forget it :) Thanks.

Metadata Update from @tbordaz:
- Custom field origin adjusted to IPA (was: None)
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1516309

6 years ago

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to review (was: None)

6 years ago

The patch looks good.

And I agree that setting the flag always to true if the task is created at startup in replica_check_for_tasks, but would it be so difficult to just append the flag to the taskinfo stored in the dse.ldif and parse it like the other attrs ?

@lkrispen thanks for the review.
It is indeed a good idea. I will rework the patch. Thanks !!!

But then we are exposing an external implementation of the del of the keep alive to the configuration. That doesn't seem right to me.

I think it's better if we set the data->original_task flag based on origin of the task, not from attrs in the config ....

I think you miss the point, if the server crashes or is shutdown before the task is completed it will read the stored task description in the replica object and recreate the task. But then it no longer knows if it is origin or not.

We already temporariliy store the task attributes like RID,force,.. in an encoded attr in the replica object, which is there as long as the task lives. It is nothing you can configure.

Hmmm okay. Maybe I am missing something then.

I will see the final patch and investigate more :)

Thank you,

new patch and test case implementing @lkrispen proposal (https://pagure.io/389-ds-base/issue/49463#comment-482445) to store into the attribute nsds5ReplicaCleanRUV (of the replica) the flag that the cleanAllRuv was received directly (Original) or through extop (Propagated)

The test case is in a separated patch because it will need python3 rework

As we agreed before (in irc), we can push the C code first (after the review by devs) and the test case code is NOGO for now.

The test case code need to be ported to Python 3, cleaned from boilerplate code, the docstrings should be in the right format and the test function and file names should be changed.

All points are discussed in https://pagure.io/389-ds-base/issue/49509#comment-486206

Once we'll go with BZ verification, we can refactor and push the patch.
Or maybe Thierry will fix it later. :)

I think in the patch the comment starting at line 149 is no longer valid, you now know if it is original or not

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

To ssh://pagure.io/389-ds-base.git
7e27fac..8448369 master -> master

Metadata Update from @tbordaz:
- Issue set to the milestone: 1.3.6.0

6 years ago

Metadata Update from @tbordaz:
- Issue assigned to tbordaz

6 years ago

To ssh://pagure.io/389-ds-base.git
26f37b8..f1373f0 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

To ssh://pagure.io/389-ds-base.git
cf6d12c..2fa64a7 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

Test case pushed on behalf @aadhikari . Thanks
37f919a..28a5ddb master

Metadata Update from @tbordaz:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2522

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata