setup mmr m1 and m2
add an entry m1 with a single valued attr - replicate to m2
1) pause replication on m1 - ldapmodify dn: entry changetype: modify delete: svattr svattr: origvalue
sleep 1
on m2 - ldapmodify dn: entry changetype: modify delete: svattr svattr: origvalue - add: svattr svattr: newvalue
unpause replication m1 to m2 sleep 5 unpause replication m2 to m1 sleep 5
the m2 mod should "win" - but it does not - different results on each server
2) same as 1) - but do the m2 mod first, then the m1 mod, and unpause m2-m1 first, then m1-m2
3) delete the svattr from m1 - replicate to m2 pause replication on m1 - ldapmodify dn: entry changetype: modify replace: svattr svattr: origvalue - add: svattr svattr: newvalue - delete: svattr svattr: origvalue
modrdn - newrdn "svattr=newvalue" deleteoldrdn=1
on m2 modrdn - newrdn "svattr=origvalue"
unpause m1-m2 sleep 5 unpause m2-m1 sleep 5
the entries will not be in sync
4) same as 3) but do the ops on m2 first, then on m1, then unpause m2-m1 first, then m1-m2
There is another scenario using M1,M2, M3 M1,M2,M3 are in sync, using employeenumber as svattr
stop M2,M3 start M1
on M1 delete: employeenumber employeenumber: oldnumber - add: employeenumber employeenumber: oldnumber+1
stop M1 start M2 on M2: delete: employeenumber employeenumber: oldnumber - add: employeenumber employeenumber: oldnumber+2
stop M2 start M3 on M3: delete: employeenumber employeenumber: oldnumber - add: employeenumber employeenumber: oldnumber+3
start M1 start M2
M1 and M2 have oldnumber+3 M3 has oldnumber+2
that was from a deployment where I found the issue, 2 masters will probably show the problem as well
I wrote a doc for update resolution for single valued attributes: http://port389.org/wiki/Update_resolution_for_single_valued_attributes
and implemented a test suite using lib389. There are 707 testcases based on the tables in chpt 6, and about 300 are failing. For an example based on comment6, where there are three masters, but updates were done only to two masters, the result after replication convergence is:
M1 nscpentrywsi: employeeNumber;vucsn-536b5a98000000c80001: 21000 M2 nscpentrywsi: employeeNumber;vucsn-536b5a20000000640001: 11000 M3 nscpentrywsi: employeeNumber;adcsn-536b5a20000000640000;vucsn-536b5a98000000c80001: 21000
so not only the values differ, but also where valuse are identical the replication meta data differ.
I will concentrate on fixing this testcase next
the reason for the failure for the example in comment8 is that state resolution handles every value found in the deleted values as pending value and makes it the present value. Checking if these values have an update csn fixes these cases, but probably a value when it is deleted could just be removed instead of moving to deleted values
The next type of failure is: on M1 delete the single valued attribute (by an empty replace) on M1 make it distinguished. As a result the value is part of the rdn, but the attribute is gone.
Need to verify if this is not an effect of the previous fix.
Fixed the scenario in comment 9
There were two problems, the csn of the modrdn is greater than the csn of the delete. If the delete was received after the modrdn there are situations where the value only has a a MDCSN, but no VDCSN, only the vdcsn is used to compare and so it was not detected that the modrdn was afetr the delete If the modrn was received after the delete, the attr state was deleted, it was correctly detected that the modrdn is more recent, but the attr was not moved to to the present attributes and so the value did not show up in the entry
After fixing this, the number of ailing tests was considerably reduced: 297 --> 204
Now I'm looking into a scenario which is a bit unclear what the real expected behavoiur is:
Let the entry have the single valued attribute employeenumber employeenumber: 1000
On M1: changetype: modify replace: employeenumber employeenumber: 2000
On M2 changetype: modify delete: employeenumber employeenumber: 1000
After replication is converging the attribute has the value 2000 on all servers. What is correct depends on how the mod on M2 is viewed. It is deleting the single value of an attrbute, so it is equivalent to deleting the attribute. If this is done, by
or replace: employeenumber -
Then the value is removed on all servers.
If one views it just as the attempt to delete a specific value, then if it was replced before the delete fail 'non existing value' and teh replaced value remains.
I'm inclined to go with the current behaviour. It would also fit with the "single master" model, if a delete of a specific value is applied after the value was changed, the delete just fails. So it can be justified and does not change behaviour - I modified the test suite, now there are "only" 171 failures remaining
Most of the remaining failures involve modrdn operations. One specific scenario is:
On M1: changetype: modrdn newrdn: employeenumber=nnn deleteoldrdn: 0
On M2: changteype: modify replace: employeenumber -
if the csn of the change on M2 is later than on M1 then the rdn is employeenumber=nnn,<suffix> but the attribute is not present. There are two different failures. In the server receiving a delete after modrdn (with csn del > csn modrdn) the code doesn't detect that the attr has to be distinguished and cannot be deleted The other way round is more complicated: if the modrdn is received after the delete urp doesn't even know that the value has to be distinguished. The current code calls entry_add_rdn_csn() after entry_apply_mods_wsi() was executed. Trying to call it before fails because before entry apply mods the attr is not in the present attrs and so the csn cannot be set.
resolved the issues regarding th mdcsn, the mdcsn needs to be cleared for the attrsin the old_rdn and to be set for the attrs in the new_rdn before calling entry_apply_mods_wsi. resolved a few more scenarios
and now I am down to 15 failures
These are all for scenarios where the attribute is distinguished and on different master there are concurrent state changes (to different states). It is not really obvious which the correct value of teh attribute should be (will update the doc), but the value is also not consistent across the servers. Will try to find if we can get at least a consistent state.
There is a problem to do state resoultion for concurrent modrdn operations. If two modrdn operations are done concurrently one has a higher csn. When the modrdn with the lower csn is replayed the urp preop pluging detects this and the modrdn is ignored. The effect is that the dn is always updated to respect the latest modrdn, but the attribute state resolution is only called when the later modrdn is applied, so inconsistent states can result.
It would be a lot of changes to modrdn to fully handle theses situations.
I also noticed a side effect that in the case of the ignored modrdn the maxcsn in the ruv is not updated and the op is replayed until an effective mod is received.
Hi Ludwig,
Could you update this ticket with the current status?
Do we want to push this ticket to 1.3.5?
Thanks!
yes, should be 1.3.5, change is not simple, so will not get it ready earlier
Replying to [comment:18 lkrispen]:
yes, should be 1.3.5, change is not simple, so will not get it ready earlier Thank you, Ludwig!
Per triage, push the target milestone to 1.3.6.
Metadata Update from @lkrispen: - Issue assigned to lkrispen - Issue set to the milestone: 1.3.6.0
Metadata Update from @mreynolds: - Custom field component reset (from Replication - General) - Custom field reviewstatus reset (from needinfo) - Custom field rhbz reset (from 0) - Issue set to the milestone: 1.3.7.0 (was: 1.3.6.0)
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to None - Issue set to the milestone: 1.4.2 (was: 1.3.7.0)
Metadata Update from @vashirov: - Issue set to the milestone: 1.4.3 (was: 1.4.2)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/779
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.