One of the problems with CLEANALLRUV is that if a replica is down and has pending changes to send to other replicas, once that replica comes online, it will start sending updates to the other replicas, which includes the RUV containing any replicas deleted/cleaned while it was offline. Replicas that were online and were cleaned properly need a way to ignore or otherwise gracefully deal with changes from other replicas that have the deleted replica in the CSN and the RUV.
One of the suggestions was to change the replication protocol metadata in such a way that this information could be stored and replicated, but if the data were replicated to an older server that did not understand the metadata changes, it would not break replication.
Ideally we could somehow mark RUV elements as having been deleted - delete the CSNs from the RUV, or set the CSNs to all zeros, or use some "magic" CSN value that tells newer replicas that the RUV element has been deleted, but that doesn't break older replicas.
In order to allow us to extend the replication protocol RUV element in the future, we need to add a version field to the RUV element. Replicas will then "advertise" that they support RUV version N, and suppliers of those replicas will craft the RUV so that it conforms to what the consumer expects in terms of the RUV.
Another proposal...(without doing a complete rewrite)
There were actually two versions of CLEANALLRUV. The first version was two steps, and the second version did it all at once. When dealing with the possibility of a replica being down during the task, the first version(two steps) is the best option.
The one advantage to the first version is that you need to manually release the RID, which would "semi" resolve this issue. Until you released the rid, it would reject updates(csns) that contained the rid. This ensures the cleaned replica would not get polluted with the dirty rid again, but we also silently rejected the updates. Once we are sure every replica has been clean, then we would issue the "releaseRID" task. However, if a replica was down, and then comes back online, it doesn't know that it missed this cleaning. So you would have run CLEANALLRUV on that replica again.
I think we can improve the current process. Instead of issuing "CLEANALLRUV" extended ops to each agreement, we replicate it with some kind of "special" update. This way when the replica comes online it will eventually receive the "cleanallruv" update, and clean itself. So, no replica left behind.
There are some issues with the current model. Once we set the rid to be cleaned, we stopped applying updates that came from that replica(dirty rid), and we ignore these updates in the changelog as well. This means possibly lost changes!
We should still apply and replicate these updates, but not update the RUV with the dirty rid. This is where it gets tricky, and I'm not sure what the best approach might be. Maybe change the dirty rid in the csn to that replica's rid? But I'm not sure what side effects this might have.
This the current plan of attack...
monitoring thread - Loop over repl agmts that have NOT been notified, and send the extended op again. - Once all replicas have been notified, then move on to check their tombstones(RUV's) - Once all the remote RUV's have been cleaned, then release the RID. - Releasing the rid will allow that rid to be reused, and the cn=replica entry & repl agmts will have the special attributes removed.
On the remote replicas, that receive the extended op, this process is repeated.
The only things different I'm doing is using an actual slapi task, writing a new attribute to cn=replica & repl agmts, and modifying the existing monitoring thread to continue to attempt the extended op on those replicas that have not been notified. Since we are marking the repl config and repl agmts, we can handle servers restarts without interruption of the cleaning process.
Using extended ops still seems to be the best approach. Older servers are safe(they will just reject the extended op), and we don't have to change the replication protocol.
So basically the task will never end until all the the replicas have been notified. So if there are "older" replicas the process will continue forever until that agreement is removed or that server is upgraded. This makes monitoring the task more difficult, as it could run for months until a replica is upgraded.
So the main issue is that the task can run for a long time - until all the servers are up, notified, upgraded, and cleaned. I could "end" the task once we have done the initial round of notifications, but that doesn't mean all the replicas are cleaned yet. Now, I could simply log a message saying that some replicas have not been cleaned because they do not support the CLEANALLRUV task at that time, but they will be cleaned once they do support it. I'm open to suggestions here on how to best do the task logging/monitoring.
A good thing about this approach is that once a replica is upgraded, it will automatically be cleaned.
See "cleanruv-proceedure" attachment for design details. Sending fix out for review...
1.i < 64 at the line 435: why 64? Could it be CLEANRIDSIZ? {{{ … … agmt_new_from_entry(Slapi_Entry e) 425 428 ra->last_init_start_time = 0UL; 426 429 ra->last_init_status[0] = '\0'; 427 430 431 / cleanruv notification */ 432 clean_vals = slapi_entry_attr_get_charray(e, type_nsds5ReplicaCleanRUVnotified); 433 if(clean_vals){ 434 int i; 435 for (i = 0; i < 64 && clean_vals[i]; i++){ 436 ra->cleanruv_notified[i] = atoi(clean_vals[i]); 437 } 438 ra->cleanruv_notified[i + 1] = 0; 439 slapi_ch_array_free(clean_vals); 440 } else { 441 ra->cleanruv_notified[0] = 0; 442 } 443 }}}
2.I believe cleanruv_notified and abort_notified never run over the specified size since you reject the requests if too many of them come in, but it'll be nice if it's checked that less than (CLEANRIDSIZ + 1) at all the places where cleanruv_notified or abort_notified arrays are assigned (as above in agmt_new_from_entry), which would prevent the unexpected crash... {{{ a b typedef struct repl5agmt { 141 141 char attrs_to_strip; / for fractional replication, if a "mod" is empty, strip out these attributes: 142 142 * modifiersname, modifytimestamp, internalModifiersname, internalModifyTimestamp, etc / 143 143 int agreement_type; 144 int cleanruv_notified[CLEANRIDSIZ + 1]; / specifies if the replica has been notified of a CLEANALLRUV task / 145 int abort_notified[CLEANRIDSIZ + 1]; / specifies if the replica has been notified of a CLEANALLRUV ABORT task / 144 146 } repl5agmt; }}}
I noticed "atoi" is used at many places. As you see in "man atoi", atoi does not detect errors, but it returns 0 if some garbage value is passed. Do we want to check it? 2736 ra->cleanruv_notified[i] = atoi(attr_vals[i]);
The atoi() function converts the initial portion of the string pointed to by nptr to int. The behavior is the same as strtol(nptr, (char **) NULL, 10); except that atoi() does not detect errors.
I might be missing something, but these 3 functions look sharing the same algorithm (the mod_type and mod_op are different). Could they be merged into one function by controlling the behaviour with arg/flag? (I don't see agmt_set_abort_released. It's not needed?) {{{ 2619 int 2620 agmt_set_cleanruv_notified(Repl_Agmt ra, ReplicaId rid){ mod.mod_op = LDAP_MOD_ADD|LDAP_MOD_BVALUES; 2636 mod.mod_type = (char )type_nsds5ReplicaCleanRUVnotified; 2658 int 2659 agmt_set_abort_notified(Repl_Agmt ra, ReplicaId rid){ mod.mod_op = LDAP_MOD_ADD|LDAP_MOD_BVALUES; 2675 mod.mod_type = (char )type_nsds5ReplicaCleanRUVnotified; 2693 int 2694 agmt_set_cleanruv_released(Repl_Agmt ra, ReplicaId rid){ mod.mod_op = LDAP_MOD_DELETE|LDAP_MOD_BVALUES; 2709 mod.mod_type = (char )type_nsds5ReplicaCleanRUVnotified; }}}
No more "goto done". ;) {{{ … … write_changelog_and_ruv (Slapi_PBlock pb) 1083 1074 op_params->target_address.uniqueid = slapi_ch_strdup (uniqueid); 1084 1075 } 1085 1076 1077 if( is_cleaned_rid(csn_get_replicaid(op_params->csn))){ 1078 / this RID has been cleaned, just goto done */ <== 1079 object_release (repl_obj); 1080 return 0; 1081 } }}}
It's good cleanall_ruv task rejects more tasks than max number. Could there be any way how many tasks are already running? {{{ 1303 static int 1304 replica_execute_cleanall_ruv_task (Object r, ReplicaId rid, Slapi_Task task, char returntext) 1318 if(get_cleanruv_task_count() >= CLEANRIDSIZ){ 1319 / we are already running the maximum number of tasks */ 1320 cleanruv_log(pre_task, CLEANALLRUV_ID, 1321 "Exceeded maximum number of active CLEANALLRUV tasks(%d)",CLEANRIDSIZ); 1322 returntext = PR_smprintf("Exceeded maximum number of active CLEANALLRUV tasks(%d), " 1323 "you must wait for one to finish.", CLEANRIDSIZ); 1324 return LDAP_UNWILLING_TO_PERFORM; 1325 } }}}
Reading the doc "http://directory.fedoraproject.org/wiki/Howto:CLEANRUV", you put the steps for CLEANALLRUV/CLEANRUV. Would it be nice to provide a script to follow the steps?
Noriko,
[1] nice catch, fixed [2] this is already checked [3] If there is a garbage value, zero would be the correct value to put into the array. So we are safe here. [4] done [5] done [6] What do you want the script to do? Set the read only mode, delete agmts, etc? If so, I don't think that should be scripted. It's up to the admin when they can do each step, I don't think it should be forced. Now if you just want a script that fires off the task then that's fine.
Mark
in agmt_set_cleanruv_notified: {{{ data = PR_smprintf("%d", (int)rid); }}} you could just use a static buffer and avoid the malloc/free. same in agmt_set_abort_notified and agmt_set_cleanruv_released. If you use snprintf, you could use the return value of snprintf as the number of characters written to the static buffer and avoid the strlen call too. Similarly in other places in the code where smprintf is used when the input data is of fixed size (e.g. a csn string).
also in those functions, the calls to the internal modify - should at least log an error message if the result was not 0
To expand upon Noriko's comment - this use of atoi in replica_check_for_tasks is problematic: {{{ rid = atoi(ldap_utf8strtok_r(clean_vals[i], ":", &iter)); }}} If someone were to pass in bogus data in type_replicaCleanRUV, ldap_utf8strtok_r() could return NULL, and passing a NULL to atoi() will cause a crash. There should be some sort of input validation on the type_replicaCleanRUV data to ensure no crashes, and better, to inform the user that the format of the data is incorrect.
Does replica_add_cleanruv_data need to check to make sure the new value index does not exceed CLEANRIDSIZ?
What is replica_remove_cleanruv_data supposed to do? It looks like it will free more than just the given val. Maybe the slapi_ch_free_string should come before the for loop?
Replying to [comment:9 rmeggins]:
Done.
Probably, done.
The function is just to remove the in-memory cleanallruv data from that replica entry. Yes, the free was in the wrong spot.
The new patch is attached.
attachment 0001-Ticket-403-CLEANALLRUV-feature.2.patch
git merge ticket403 Updating e860135..657efc6 Fast-forward Makefile.am | 1 + Makefile.in | 1 + ldap/admin/src/scripts/template-cleanallruv.pl.in | 186 ++ ldap/schema/01core389.ldif | 17 +- ldap/servers/plugins/replication/cl5_api.c | 71 +- ldap/servers/plugins/replication/cl5_api.h | 9 +- ldap/servers/plugins/replication/repl5.h | 57 +- ldap/servers/plugins/replication/repl5_agmt.c | 110 + ldap/servers/plugins/replication/repl5_agmtlist.c | 4 + .../servers/plugins/replication/repl5_connection.c | 13 +- .../plugins/replication/repl5_inc_protocol.c | 21 +- ldap/servers/plugins/replication/repl5_init.c | 46 +- ldap/servers/plugins/replication/repl5_plugins.c | 18 +- ldap/servers/plugins/replication/repl5_replica.c | 477 ++++- .../plugins/replication/repl5_replica_config.c | 2252 +++++++++++++++----- ldap/servers/plugins/replication/repl5_ruv.c | 58 +- ldap/servers/plugins/replication/repl5_ruv.h | 1 + ldap/servers/plugins/replication/repl_extop.c | 446 ++-- ldap/servers/plugins/replication/repl_globals.c | 5 +- ldap/servers/slapd/log.c | 27 + ldap/servers/slapd/slapi-plugin.h | 3 + ldap/servers/slapd/task.c | 56 + 22 files changed, 2839 insertions(+), 1040 deletions(-) create mode 100644 ldap/admin/src/scripts/template-cleanallruv.pl.in
git push origin master Counting objects: 64, done. Delta compression using up to 4 threads. Compressing objects: 100% (32/32), done. Writing objects: 100% (33/33), 24.90 KiB, done. Total 33 (delta 28), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git e860135..657efc6 master -> master
coverity fixes 0001-Ticket-403-cleanallruv-coverity-fixes.patch
[mareynol@localhost servers]$ git merge coverity Updating 9229db2..75e5d3d Fast-forward ldap/servers/plugins/replication/repl5_agmt.c | 9 ++- ldap/servers/plugins/replication/repl5_replica.c | 33 +++++++++--- .../plugins/replication/repl5_replica_config.c | 53 +++++++++---------- ldap/servers/plugins/replication/repl_extop.c | 4 +- 4 files changed, 58 insertions(+), 41 deletions(-)
[mareynol@localhost servers]$ git push origin master Counting objects: 19, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 1.67 KiB, done. Total 10 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 9229db2..75e5d3d master -> master
two new issues were found by the IPA team:
[1] If you run the task on a replica that doesn't have any agmts its doesn't finish correctly. [2] You can get stuck in a loop, if the replica was already cleaned
Patch is being attached...
address two new bugs 0001-Ticket-403-CLEANALLRUV-amendment.patch
Attachment 0001-Ticket-403-CLEANALLRUV-amendment.patch added address two new bugs ack.
[mareynol@localhost ldap]$ git merge ticket403 Updating 3e02250..df3a0a2 Fast-forward .../plugins/replication/repl5_replica_config.c | 8 +++++--- ldap/servers/plugins/replication/repl_extop.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-)
[mareynol@localhost ldap]$ git push origin master Counting objects: 15, done. Delta compression using up to 4 threads. Compressing objects: 100% (8/8), done. Writing objects: 100% (8/8), 1.07 KiB, done. Total 8 (delta 6), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 3e02250..df3a0a2 master -> master
Mark, go ahead and cherry-pick to 389-ds-base-1.2.11 and push - best to keep the branches in sync as much as possible. We'll do the 1.2.11.10 release soon.
Rich, sounds good but I have one more change I need to commit. Patch coming soon...
needed to refine the finishing sequence using an abort flag 0001-Ticket-403-CLEANALLRUV-revision-to-last-amendment.patch
Replying to [comment:19 mreynolds]:
ack
git merge ticket403 Updating df3a0a2..82d2c51 Fast-forward .../plugins/replication/repl5_replica_config.c | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-)
[mareynol@localhost ldap]$ git push origin master Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (7/7), done. Writing objects: 100% (7/7), 834 bytes, done. Total 7 (delta 5), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git df3a0a2..82d2c51 master -> master
revised cleanruv-proceedure
attachment 0001-Ticket-403-fix-CLEANALLRUV-regression-from-last-comm.patch
Attachment 0001-Ticket-403-fix-CLEANALLRUV-regression-from-last-comm.patch added
git merge ticket403 Updating 6237131..819910d Fast-forward .../plugins/replication/repl5_replica_config.c | 31 ++++++++++++++----- ldap/servers/plugins/replication/repl5_ruv.c | 29 ++++++++++++++++++ ldap/servers/plugins/replication/repl5_ruv.h | 1 + ldap/servers/plugins/replication/repl_extop.c | 4 +-- 4 files changed, 54 insertions(+), 11 deletions(-)
[mareynol@localhost ds]$ git push origin master Counting objects: 19, done. Delta compression using up to 4 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 1.75 KiB, done. Total 10 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 6237131..819910d master -> master
some very minor coverity fixes 0001-CLEANALLRUV-coverity-fixes.patch
ack.
[mareynol@localhost servers]$ git merge coverity Updating 34eea5d..37e0121 Fast-forward ldap/servers/plugins/replication/repl5_agmt.c | 4 ++-- ldap/servers/plugins/replication/repl5_replica.c | 7 ++++--- 2 files changed, 6 insertions(+), 5 deletions(-)
[mareynol@localhost servers]$ git push origin master Counting objects: 15, done. Delta compression using up to 4 threads. Compressing objects: 100% (8/8), done. Writing objects: 100% (8/8), 915 bytes, done. Total 8 (delta 6), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 34eea5d..37e0121 master -> master
Added initial screened field value.
10/17 - some maxcsn compare fixes, and added a new option (replica-force-cleaning) 0001-Ticket-403-CLEANALLRUV-minor-fixes-and-add-support-f.patch
Metadata Update from @mreynolds: - Issue assigned to mreynolds - Issue set to the milestone: 1.2.11.8
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/403
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.