The scenario looks like this:
1) set up 3-way MMR 2) run a series of operations against 3 masters at the same time to cause conflicts. The operations include add an entry A and delete the entry A. 3) after running the operations, the result shows... Master 0: one A as a conflict entry (nsuniqueid=...+A) one tombstone A (usuniqueid=...,A) one live A (A) Master 1: two conflict A's one tombstone A Master 2: one conflict A one tombstone A one live A
The tombstones are all the same entry having the same nsuniqueID. Conflict A's are the result of the conflicting adds.
And each server logs an urp_delete conflict message(s) (Master 0 and 2 logged 2; Master 1 logged 1 message.) [..] conn=8 op=57 csn=51e08d3c000100020000 - urp_delete: Entry "nsuniqueid=532a7401-eb4811e2-aeb2c876-68451d26,uid=1456,ou=People,dc=example,dc=com" is already a Tombstone. Note: the nsuniqueids appeared in the logs are the same on all 3 masters.
There are some facts. ADD:
DELETE:
Ludwig wrote: As a general rule I would say teh content of a backend has to be identical on all replicas after conflict resolution - and if not I would consider it as a bug.
Can we improve conflict resolution to correctly handle all cases ? In your description of the problem we can't see the exact order of when the deletes are received, if prior or after the conflicting add was received, I think we need to investigate some specific, controlled scenarios, eg
1] add the same entry concurrently on two masters 2] delete the entry concurrently 2.1 before the adds are replicated 2.2 after one add is replicated to one of the other masters 2.3 after the adds are replicated to all masters
Since the replicated deletes are applied to the entry with the nsuniqueid, it would be interesting to see what exactly happens if on one master
just to be clear, we do not replicate conflicting entries, we replicate the ADD and on each replica a conflicting entry will be created. But there are probably some scenarios where this is not triggered.
t0: add A to master1 t1: add A to master2 t2: del A on master2 t3: repl Add A from master2 to master1, a conflicting entry will be created t4: repl AddA from master1 to master 2, entry does not exist any more on M2, so no conflicting entry is created t5: repl DelA from M2 to M1, the entry with the same nsuniquid as the one deleted on M2 is tombstoned on M1
Noriko ran more test cases Ludwig suggested: Scenario 1)
t0: shutdown master 2 t1: add A to master 1 (call A-1) t2: shutdown master 1 t3: start master 2 t4: add A to master 2 (call A-2) t5: start master 1 (A-1 is live; A-2 turned to be a conflict) t6: stop master 1 t7: delete A on master 2 (A-1 is tombstoned) t8: start master 1 (delete is replicated to master 1) Result: Master 1: tombstoned A-1 and conflict A-2 Master 2: tombstoned A-1 and conflict A-2
Scenario 2)
t0: shutdown master 2 t1: add A to master 1 (call A-1) t2: shutdown master 1 t3: start master 2 t4: add A to master 2 (call A-2) t5: delete A on master 2 (A-2 is tombstoned) t6: start master 1 (A-2 turned to be a conflict, which is already tombstoned) Result: Master 1: live A-1 and tombstoned, conflict A-2 (<uniqueid>,<uniqueid>+A-2) Master 2: live A-1 and tombstoned A-2 (<uniqueid>,A-2)
Solving all the conflict mismatches could be really tough... I rather thought 1) to log conflicts in the error log 2) to compare the local conflicts with the remote ones, if there is any mismatches, log it, as well.
But Nathan pointed out an issue there: a replica bind user may not have a right to search conflicts on the counter part. So, this may not be realistic to implement.
Another point he made is regardless of mismatches, conflict needs to be solved by the DS (human) administrator. It'd be ideal to provide some GUI to show 1) conflict lists 2) live entry and conflict entry/entries side by side to let administrator to pick entry/attributes.
I have not yet reproduced the reported scenario, but have an initial test to produce inconsistencies. Test is as follows:
on M1 delete child1
on M2 add child1
on M3 add child 1
enable replication
the result is:
==================== M1 ==================== dn: cn=child1+nsuniqueid=612e2c01-717b11e5-994a9c21-3dbd4905,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 612e2c01-717b11e5-994a9c21-3dbd4905 objectclass: top objectclass: person
==================== M2 ==================== dn: cn=child1,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 612e2c01-717b11e5-994a9c21-3dbd4905 objectclass: top objectclass: person
==================== M3 ==================== dn: cn=child1+nsuniqueid=612e2c01-717b11e5-994a9c21-3dbd4905,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 612e2c01-717b11e5-994a9c21-3dbd4905 objectclass: top objectclass: person
so on M1, M3 the conflict entry survives while on M2 the "original" entry is present
Finally reproduced the scenario reported, it is teh same sequence of operations as the previous example, only replication from M2 is started before replication on M1 and M3
==================== M1 ==================== dn: nsuniqueid=699c6a01-724011e5-810fd684-7d474a20,cn=child1,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 699c6a01-724011e5-810fd684-7d474a20 objectclass: top objectclass: person objectclass: nsTombstone
dn: cn=child1,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 4a054201-724011e5-887ab70d-94ee8082 objectclass: top objectclass: person
dn: cn=child1+nsuniqueid=4d98c901-724011e5-8b56b53b-b649edeb,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 4d98c901-724011e5-8b56b53b-b649edeb objectclass: top objectclass: person
==================== M2 ==================== dn: cn=child1+nsuniqueid=4a054201-724011e5-887ab70d-94ee8082,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 4a054201-724011e5-887ab70d-94ee8082 objectclass: top objectclass: person
dn: nsuniqueid=699c6a01-724011e5-810fd684-7d474a20,cn=child1,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 699c6a01-724011e5-810fd684-7d474a20 objectclass: top objectclass: person objectclass: nsTombstone
dn: cn=child1,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 4d98c901-724011e5-8b56b53b-b649edeb objectclass: top objectclass: person
==================== M3 ==================== dn: cn=child1+nsuniqueid=4d98c901-724011e5-8b56b53b-b649edeb,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 4d98c901-724011e5-8b56b53b-b649edeb objectclass: top objectclass: person
dn: cn=child1+nsuniqueid=4a054201-724011e5-887ab70d-94ee8082,cn=new_account1,cn=staged user,dc=example,dc=com nsuniqueid: 4a054201-724011e5-887ab70d-94ee8082 objectclass: top objectclass: person
Per triage, push the target milestone to 1.3.6.
Metadata Update from @lkrispen: - Issue assigned to lkrispen - Issue set to the milestone: 1.3.6.0
Metadata Update from @mreynolds: - Custom field component reset (from Replication - General) - Issue close_status updated to: None - Issue set to the milestone: 1.3.7.0 (was: 1.3.6.0)
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to None - Issue set to the milestone: 1.4.2 (was: 1.3.7.0)
Metadata Update from @vashirov: - Issue set to the milestone: 1.4.3 (was: 1.4.2)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/769
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.