#49446 cleanallruv could break replication if startCsn originated from deleted replica
Closed: fixed 2 years ago Opened 2 years ago by tbordaz.

Issue Description

When cleanallruv does not complete, ruv comparison can pick up as startCSN from a replicaId we want to delete.
If the startCSN change was removed from the Changelog it can break replication

Package Version and Platform

At least from 1.3.0

Steps to reproduce

see attached test case

Actual results

Replication breaks

Expected results

Replication could (should ?) avoid selecting startCSN from a deleted replica


Metadata Update from @tbordaz:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

2 years ago

I think it is ok to skip the pre cleaned rid when calculating the anchor csn, do we really need an extra config attr ?
And the patch looks good, but if I understand your test, this is not really a valid scenario, cleanallruv should only be run for a replica that was removed, cleaning a live rid should be prevented if possible, but that would require another approach.
Is the incorrect start csn possible if only a removed rid is cleaned ?

I don't think we need another config attr either. Lets make it "work correctly" rather than lots of flags for this. If you want an enable/disable, make it a #define switch perhaps, but even then, we should "just fix it".

I'm not sure about the test case for reproducing this error, so I'll trust @lkrispen here.

thinking about this again it looks like this new option would contradict force==no, where we want to to wait until all replicas are in sync, but if we no longer send updates for a precleaned rid this will never happen.

In my opinion we could handle it as another effect of force==yes, with force: yes we want to get rid of all traces of the rid and don care, so if force==yes, we could set the cleaned rid earlier, eg

if (force) set cleaned rid
else set precleaned rid

the rest of the code would not need to be changed to achieve what you want.

@mreynolds when looking at the cleanallruv code I have two questions:
- we reject another cleanallruv task if the ris is cleaned, but shouldn't already a precleaned rid reject anotehr task for this rid ?
- we do not cleanruv on readonly replicas, but a the ruv of a read only replica can also contain references to the rid, or is only the origin for a task rejected on readonly replicas ?

I think it is ok to skip the pre cleaned rid when calculating the anchor csn, do we really need an extra config attr ?
And the patch looks good, but if I understand your test, this is not really a valid scenario, cleanallruv should only be run for a replica that was removed, cleaning a live rid should be prevented if possible, but that would require another approach.
Is the incorrect start csn possible if only a removed rid is cleaned ?

@lkrispen, I think dropping the config attribute makes consensus. Need to revisit the patch

Regarding the test, I did it in a rush and did not write appropriate comments. Master 3 is a suppressed master. The test stops it but the intention is to clear/remove it from the topology.
The scenario raise the possibility that a startCSN can be pickup from a clearing RID.
That is valid but if it leads to replication breakage (missing csn) it is too bad.

If cleanAllRuv is forced, local RUV/CL have no knowledge of the cleared rid. So startCSN will only be from not cleared RID. IMHO it can not happen with forced cleanallRUV.
During cleanallRuv processing the RUV is updated before the changelog is trimmed, that would also prevent the startCSN to be pick up from a cleared/clearing RID

now I understand even less the need for this fix.

if enforce==yes, you say it cannot happen
if enforce is no, we need to wait until the ruvs are in sync for the cleaned rid and if you ignore pre cleaned rids that will never happen.

so there is already an option to achieve what you want: enforce=yes (eventually add my suggestion about pre cleaned rids)

@mreynolds when looking at the cleanallruv code I have two questions:
- we reject another cleanallruv task if the ris is cleaned, but shouldn't already a precleaned rid reject anotehr task for this rid ?

It does, and this is noted in the design doc:

http://www.port389.org/docs/389ds/design/cleanallruv-design.html

We check for this in the extended op code (See multimaster_extop_cleanruv() ):

    if (is_cleaned_rid(rid) || is_pre_cleaned_rid(rid) || is_task_aborted(rid)) {
        csn_free(&maxcsn);
        rc = LDAP_SUCCESS;
        goto free_and_return;
    }

If the extended op comes in and we are already cleaning the rid just return success and move on.

  • we do not cleanruv on readonly replicas,

Yes we do, this is in the extended op code again (See multimaster_extop_cleanruv() )

but a the ruv of a read only replica can also contain references to the rid, or is only the origin for a task rejected on readonly replicas ?

I think I answered this but maybe not. Please let me know if I missed something.

I was looking at replica_cleanall_ruv_task() not the extop, and we have:

  if(is_cleaned_rid(rid)){
       /* we are already cleaning this rid */
       PR_snprintf(returntext, SLAPI_DSE_RETURNTEXT_SIZE, "Replica id (%d) is already being cleaned", rid);
        cleanruv_log(task, rid, CLEANALLRUV_ID, SLAPI_LOG_ERR, "%s", returntext);
      *returncode = LDAP_UNWILLING_TO_PERFORM;
       rc = SLAPI_DSE_CALLBACK_ERROR;
       goto out;
  }

so there could be new task for the same rid, if it is in precleaned, but not cleaned

I was looking at replica_cleanall_ruv_task() not the extop, and we have:
if(is_cleaned_rid(rid)){
/ we are already cleaning this rid /
PR_snprintf(returntext, SLAPI_DSE_RETURNTEXT_SIZE, "Replica id (%d) is already being cleaned", rid);
cleanruv_log(task, rid, CLEANALLRUV_ID, SLAPI_LOG_ERR, "%s", returntext);
*returncode = LDAP_UNWILLING_TO_PERFORM;
rc = SLAPI_DSE_CALLBACK_ERROR;
goto out;
}

so there could be new task for the same rid, if it is in precleaned, but not cleaned

I see, yes you are right (simple fix though). I thought you meant something else. Perhaps it is time to improve cleanallruv a bit. I know I want to improve logging to say whether the force option was used, and I also want to add some minor logic improvements when force is used. Like still do one pass on the checks (instead of just skipping them), and if the first check fails, log a message, and just keep moving on to the next phase.

  This is something I am missing.
  If we are in force=no, then when we will evaluation the local RUV we will ignore the updates of the RID being deleted (even if we have those updates in our changelog). But we should be able to  send the others updates we have and eventually have consumerRUVelementS >= supplierRUVelements.

yes, but if you ignore the updates of the RID being deleted you contartict the meaning of force==no, which means you WANT the changes for the delted rid to be received everywhere bfore cleaning the ruv

@lkrispen do you mean it is missing something like

diff --git a/ldap/servers/plugins/replication/repl5_replica_config.c b/ldap/servers/plugins/replication/repl5_replica_config.c
index f28044c..cbf4c65 100644
--- a/ldap/servers/plugins/replication/repl5_replica_config.c
+++ b/ldap/servers/plugins/replication/repl5_replica_config.c
@@ -1691,7 +1691,7 @@ replica_cleanallruv_thread(void *arg)
              "Waiting to process all the updates from the deleted replica...");
     ruv_obj = replica_get_ruv(data->replica);
     ruv = object_get_data(ruv_obj);
-    while (data->maxcsn && !is_task_aborted(data->rid) && !is_cleaned_rid(data->rid) && !slapi_is_shutting_down()) {
+    while (data->maxcsn && !is_task_aborted(data->rid) && !is_cleaned_rid(data->rid) &&  !is_pre_cleaned_rid(data->rid) && !slapi_is_shutting_down()) {
         if (csn_get_replicaid(data->maxcsn) == 0 ||
            ruv_covers_csn_cleanallruv(ruv, data->maxcsn) ||
             strcasecmp(data->force, "yes") == 0) {

no, I mean we should NOT ignore precleaned only rids, if we want to enforce this set it to cleaned if in force mode. If in non-force mode send updates for precleaned rids.
The problem there is not in sending the updtaes, but that they get trimmed before sent

That won't work, because a few lines up the rid is set to pre-cleaned, so !is_pre_cleaned_rid(data->rid) is always false.

@tbordaz sorry, instead of replying to your comment I had edited it

thinking about this again it looks like this new option would contradict force==no, where we want to to wait until all replicas are in sync, but if we no longer send updates for a precleaned rid this will never happen.
In my opinion we could handle it as another effect of force==yes, with force: yes we want to get rid of all traces of the rid and don care, so if force==yes, we could set the cleaned rid earlier, eg
if (force) set cleaned rid
else set precleaned rid
the rest of the code would not need to be changed to achieve what you want.

This should work, and complies with the current design. The main purpose, but not the only one, of the pre_cleaned_rid list is allow the changelog to flush out updates. If we don't care about that then we can set the rid as cleaned.

thinking about this again it looks like this new option would contradict force==no, where we want to to wait until all replicas are in sync, but if we no longer send updates for a precleaned rid this will never happen.
In my opinion we could handle it as another effect of force==yes, with force: yes we want to get rid of all traces of the rid and don care, so if force==yes, we could set the cleaned rid earlier, eg
if (force) set cleaned rid
else set precleaned rid
the rest of the code would not need to be changed to achieve what you want.

This should work, and complies with the current design. The main purpose, but not the only one, of the pre_cleaned_rid list is allow the changelog to flush out updates. If we don't care about that then we can set the rid as cleaned.

I agree that it should work and valid improvement of force=yes.
But if I understand it correctly it does not change the behavior in force=no that is the subject of that ticket.

@yes, but I think for the current ticket with force==no, you need to find another solution, we have to find out why and prevent the trimming of the changelog in that scenario.

@lkrispen, okay I think we are on the same page. We can improve force=yes with your proposal.
The current patch for force=no, needs deeper thinking and rework.
That is fine to me. Thanks

@lkrispen after a deeper thought I agree with your point (force=no).
So just for the record, my understanding is:
With force=no, we would like to make sure all updates, known by the server running cleanAllRuv, are replicated. With the current patch it replicates everything but can skip some updates received by the RID to clean. Those updates are important, else the user did not use force=no so we should not take the risk to skip them.
The risk is that the startCSN is missing. This is normal to break replication in such situation.
The question is should we have a special consideration if the startCSN is from a RID we want to clear

This ticket is looking to be a side effect of https://pagure.io/389-ds-base/issue/49413
Where changelog trimming can delete some updates that may be needed later.

If we abandon the idea to have a specific action if replication is breaking on a cleanRuv RID, then this current tiket could be closed as duplicate of https://pagure.io/389-ds-base/issue/49413

Metadata Update from @tbordaz:
- Custom field origin adjusted to IPA (was: None)
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1507194

2 years ago

I think we should use this ticket to harden the force == yes case and set the cleanedrid earlier.

0001-Ticket-49446-cleanallruv-could-break-replication-if-.patch

this is an implementation for the simple "hardening"

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to review (was: None)

2 years ago

That looks good to me.
We may change the subject of the ticket regarding the final fix. This is an improvement of the force mode to ignore the sync phase.

ACK

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: review)

2 years ago

To ssh://pagure.io/389-ds-base.git
fa71a0a..339caec master -> master

To ssh://pagure.io/389-ds-base.git
11a2517..f20aa78 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

@tbordaz as discussed PFA for the patch for test case. I have removed nsds5ReplicaIgnorePreClean attribute and used forced CleanRUV in the code. But the expected values are not coming and I am getting assertion errors.
Please check the and share what should be changed here.

0001-Ticket-49446-Add-CI-test-case.patch

You can remove

+CHANGELOG = 'cn=changelog5,cn=config'

As you don't use it.

And consider sparing the "replace_many" over a few lines for clarity,

I don't like the time.sleep in the test, can you consider using replicationmanager's wait for repl function instead?

0001-Ticket-49446-Add-CI-test-case.patch

Thanks @firstyear , Modified the code for the rest of the comments except time.sleep. These are some special arrangements done by @tbordaz to trigger changelog trim and interval. I will wait for his review as test case is not working currently. Thanks for your inputs again.

Did some sloppy mistake, here is the new one

0001-Ticket-49446-Add-CI-test-case.patch

@amsharma , I am not a python expert but I think the assertion failure is cause by comparing two different user objects rather than comparing their dn. I did the following changes and it works for me

expected_m1_users = (user31, user11, user21, user32, user33, user12 )                                                           
expected_m2_users = (user31, user11, user21, user12)
#
# remove: assert expected_m1_users in users_m1.list()
# replaced by
#
for expected_user in expected_m1_users:
    found = False
    for present_user in users_m1.list():
        if expected_user.dn == present_user.dn:                                                                                
            found = True
            break
    assert found

Do you think it could be the reason of TC failure ?

One way you can improve it is to use the list comprehensions:

expected_m1_users = [user31.dn, user11.dn, user21.dn, user32.dn, user33.dn, user12.dn]
expected_m2_users = [user31.dn, user11.dn, user21.dn, user12.dn]
current_m1_users = [user.dn for user in users_m1.list()]
current_m2_users = [user.dn for user in users_m2.list()]

assert set(expected_m1_users).issubset(current_m1_users)
assert set(expected_m2_users).issubset(current_m2_users)

0001-Ticket-49446-Add-CI-test-case.patch

Thanks @tbordaz and @spichugi for looking. PFA for new patch and it is passing now. Please review and merge if looks good.

commit 66ecdf9
Author: Amita Sharma amsharma@redhat.com
Date: Tue Jan 30 19:47:36 2018 +0530

Metadata Update from @lkrispen:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata