#48218 cleanAllRUV - modify the existing "force" option to bypass the "replica online" checks
Closed: Fixed None Opened 4 years ago by mreynolds.

It is becoming more frequent where replication agreements exist that are no longer pointing to valid replicas. This blocks the cleanAllRUV task from finishing. While removing these agreements would allow the cleanallruv task to continue its work - this is commonly not done.

The force option should be extended to bypass the "replica online" checks.


Hi I'd like to ask what is the probability of this ticket to be implemented in 1.3.5.

This is a common issue on freeipa-users list. If it was implemented we can then combine it with https://fedorahosted.org/freeipa/ticket/5411 to make cleaning of RUVs more or less automatic. It would increase user satisfaction with 389 and FreeIPA and save developer time.

Another reason is that RUV-related errors often take attention from real replication issues.

it should be possible to move it from 1.3.5 backlog to 1.3.5 - changing state to triage again

Looks good to me.

I'm curious if the force is "yes" and some replica are down, there is any easy way which one was down and not cleaned?

Replying to [comment:5 nhosoi]:

Looks good to me.

I'm curious if the force is "yes" and some replica are down, there is any easy way which one was down and not cleaned?

Yes, messages are logged when the a server id down. Here is the error log output when using the force option and a replica is down:

{{{
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Initiating CleanAllRUV Task...
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Retrieving maxcsn...
[20/Jan/2016:10:49:09 -0500] slapi_ldap_bind - Error: could not send bind request for id [uid=replica,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 107 (Transport endpoint is not connected, host "localhost.localdomain:7777")
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - agmt="cn=to replica" (localhost:7777): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Found maxcsn (569fabe6000000020000)
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Cleaning rid (2)...
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Waiting to process all the updates from the deleted replica...
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Waiting for all the replicas to be online...
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Waiting for all the replicas to receive all the deleted replica updates...
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Sending cleanAllRUV task to all the replicas...
[20/Jan/2016:10:49:09 -0500] slapi_ldap_bind - Error: could not send bind request for id [uid=replica,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 107 (Transport endpoint is not connected, host "localhost.localdomain:7777")
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - agmt="cn=to replica" (localhost:7777): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Failed to send task to replica (agmt="cn=to replica" (localhost:7777))
[20/Jan/2016:10:49:09 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Cleaning local ruv's...
[20/Jan/2016:10:49:10 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Waiting for all the replicas to be cleaned...
[20/Jan/2016:10:49:10 -0500] slapi_ldap_bind - Error: could not send bind request for id [uid=replica,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 107 (Transport endpoint is not connected, host "localhost.localdomain:7777")
[20/Jan/2016:10:49:10 -0500] NSMMReplicationPlugin - agmt="cn=to replica" (localhost:7777): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[20/Jan/2016:10:49:10 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Waiting for all the replicas to finish cleaning...
[20/Jan/2016:10:49:10 -0500] slapi_ldap_bind - Error: could not send bind request for id [uid=replica,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 107 (Transport endpoint is not connected, host "localhost.localdomain:7777")
[20/Jan/2016:10:49:10 -0500] NSMMReplicationPlugin - agmt="cn=to replica" (localhost:7777): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[20/Jan/2016:10:49:10 -0500] NSMMReplicationPlugin - CleanAllRUV Task (rid 2): Successfully cleaned rid(2).
}}}

097239c..ec3f8da master -> master
commit ec3f8da
Author: Mark Reynolds mreynolds@redhat.com
Date: Wed Jan 20 10:53:55 2016 -0500

Yes, messages are logged when the a server id down. Here is the error log output when using the force option and a replica is down:
The messages are very clear. Thanks, Mark!

Thanks for implementing it.

Metadata Update from @nhosoi:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.5.0

2 years ago

Login to comment on this ticket.

Metadata