#47783 entry differs on M1 vs M2 after a MODRDN on a tombstone
Closed: wontfix 3 years ago by spichugi. Opened 10 years ago by tbordaz.

Here is the test case I used:

    Setup 2 masters M1 and M2
    pause RA M1->M2 and RA M2->M1
    On M1:
        delete an entry (.e.g cn=user1,cn=staged users,dc=example,dc=com)
        mod a test entry (.e.g cn=test1,dc=example,dc=com)

    sleep 1s so that delete.csn and modrdn.csn are different
    On M2:
        modrdn the entry on M2
        mod a test entry on M2 (.e.g cn=test2,dc=example,dc=com)
    resume RA M1->M2 and RA M2->M1
    Check replication is working
    Check the status of the DEL/MODRDN entry on both server

A good point is that I was not able reproduce a replication failure, so the replication is not broken (the mods on test entries are always successfully replicated).
I did the following tests on the master branch, to check the final state of the tombstone on both server

  • rename (same rdn + delold=0 + same superior)
    the entry is identical on both servers

  • rename (same rdn + delold=1 + same superior)
    the entry is identical on both servers

  • rename (change rdn (new_account1 -> new_account1_modrdn)+ delold=0 + same superior)
    the entry differs
    M1[dn] = nsuniqueid=1708c18c-c56711e3-a07accf0-3a563faf,cn=new_account1,cn=staged user,dc=example,dc=com
    M2[dn] = nsuniqueid=1708c18c-c56711e3-a07accf0-3a563faf,cn=new_account1_modrdn,cn=staged user,dc=example,dc=com
    M2[cn] = new_account1_modrdn

*rename (change rdn (new_account2 -> new_account2_modrdn) + delold=1 + same superior)
M1[dn] = nsuniqueid=1708c18d-c56711e3-a07accf0-3a563faf,cn=new_account2,cn=staged user,dc=example,dc=com
M2[dn] = nsuniqueid=1708c18d-c56711e3-a07accf0-3a563faf,cn=new_account2_modrdn,cn=staged user,dc=example,dc=com
M1[cn] = new_account2
M2[cn] = new_account2_modrdn

  • rename (same rdn + delold=0 + new superior)
    the entry differs
    M1[dn] = nsuniqueid=1708c190-c56711e3-a07accf0-3a563faf,cn=new_account5,cn=staged user,dc=example,dc=com
    M2[dn] = nsuniqueid=1708c190-c56711e3-a07accf0-3a563faf,cn=new_account5,cn=accounts,dc=example,dc=com
    M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf
    M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563fafldjf

  • rename (same rdn + delold=1 + new superior)
    M1[dn] = nsuniqueid=1708c191-c56711e3-a07accf0-3a563faf,cn=new_account6,cn=staged user,dc=example,dc=com
    M2[dn] = nsuniqueid=1708c191-c56711e3-a07accf0-3a563faf,cn=new_account6,cn=accounts,dc=example,dc=com
    M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf
    M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563faf

  • rename (change rdn (new_account7->new_account7_modrdn)+ delold=0 + new superior)
    M1[dn] = nsuniqueid=1708c192-c56711e3-a07accf0-3a563faf,cn=new_account7,cn=staged user,dc=example,dc=com
    M2[dn] = nsuniqueid=1708c192-c56711e3-a07accf0-3a563faf,cn=new_account7_modrdn,cn=accounts,dc=example,dc=com
    M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf
    M2[cn] = new_account7_modrdn
    M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563faf

  • rename (change rdn (new_account8->new_account8_modrdn)+ delold=1 + new superior)
    M1[dn] = nsuniqueid=1708c193-c56711e3-a07accf0-3a563faf,cn=new_account8,cn=staged user,dc=example,dc=com
    M2[dn] = nsuniqueid=1708c193-c56711e3-a07accf0-3a563faf,cn=new_account8_modrdn,cn=accounts,dc=example,dc=com
    M1[cn] = new_account8
    M1[nsParentUniqueId] = 1708c189-c56711e3-a07accf0-3a563faf
    M2[cn] = new_account8_modrdn
    M2[nsParentUniqueId] = 1708c18a-c56711e3-a07accf0-3a563faf

no cloning, upstream tests already written.

Hi Thierry, I tried to run the reproducer ticket47783_test.py, but so far no luck.

First, it failed since this constants not found. So, I removed the line.
{{{
20d19
< from constants import
}}}
Then, it fails to create an instance at the line 180.
{{{
179 # Create the instances
180 master1.create()
}}}
This is the last part of the output from the test script:
{{{
/home/nhosoi/.dirsrv/dirsrv-

/home/nhosoi/install/etc/sysconfig/dirsrv-*
Adding group dirsrv
Traceback (most recent call last):
File "/export/src/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1163, in <module>
run_isolated()
File "/export/src/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1143, in run_isolated
topo = topology(True)
File "/export/src/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 180, in topology
master1.create()
File "/export/src/389tests/lib389/lib389/init.py", line 793, in create
self._createDirsrv(verbose=self.verbose)
File "/export/src/389tests/lib389/lib389/init.py", line 747, in _createDirsrv
DirSrvTools.lib389User(user=DEFAULT_USER)
File "/export/src/389tests/lib389/lib389/tools.py", line 860, in lib389User
DirSrvTools.makeGroup(group=user)
File "/export/src/389tests/lib389/lib389/tools.py", line 841, in makeGroup
subprocess.Popen(cmd)
File "/usr/lib64/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
}}}
Could you please tell me what is wrong with my attempt to run the script? I guess I must miss something in the procedure... Please note that I'm using the master branch of lib389 which is up-to-date.

Hi Noriko,

The test case is running fine on my laptop, even with lib389 master up to date.

When running as 'root', it starts instances belonging to default user/group 'dirsrv'.
When running as regular user (e.g. xyz), instances will belong to the user/group 'xyz'.. but it checks/creates user/group 'dirsrv' (although if it not useful in that case).

On my system I created that user/group -> dirsrv/dirsrv and I think it is the reason why it succeeded.
Would you check if those user/group exist on your machine. If not, could you create them and rerun the test ?

Replying to [comment:4 tbordaz]:

Hi Noriko,

The test case is running fine on my laptop, even with lib389 master up to date.

When running as 'root', it starts instances belonging to default user/group 'dirsrv'.
When running as regular user (e.g. xyz), instances will belong to the user/group 'xyz'.. but it checks/creates user/group 'dirsrv' (although if it not useful in that case).

On my system I created that user/group -> dirsrv/dirsrv and I think it is the reason why it succeeded.
Would you check if those user/group exist on your machine. If not, could you create them and rerun the test ?

I just made an update to lib389 to use 'dirsrv;dirsrv" when running lib389 as 'root'. You do not need to create this user/group, lib389 will do it for you if it does not already exist. Noriko, make sure you do a 'git pull' on your lib389 source.

Replying to [comment:6 mreynolds]:

Replying to [comment:4 tbordaz]:

Hi Noriko,

The test case is running fine on my laptop, even with lib389 master up to date.

When running as 'root', it starts instances belonging to default user/group 'dirsrv'.
When running as regular user (e.g. xyz), instances will belong to the user/group 'xyz'.. but it checks/creates user/group 'dirsrv' (although if it not useful in that case).

On my system I created that user/group -> dirsrv/dirsrv and I think it is the reason why it succeeded.
Would you check if those user/group exist on your machine. If not, could you create them and rerun the test ?

I just made an update to lib389 to use 'dirsrv;dirsrv" when running lib389 as 'root'. You do not need to create this user/group, lib389 will do it for you if it does not already exist. Noriko, make sure you do a 'git pull' on your lib389 source.

Hi Mark,

yes, lib389 will create user/group... at the condition it is run as root.
If on a brand new machine, a regular user runs a test case it will not be allowed create dirsrv/dirsrv user/groups.

Thank you, Thierry. That was it. I created a user dirsrv (I already had a group dirsrv), the test started running!

Now I'm getting this assertion failure.
{{{
Update succeeded: status 0 Total update succeeded
Traceback (most recent call last):
File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1163, in <module>
run_isolated()
File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 1148, in run_isolated
test_ticket47783_2(topo)
File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 609, in test_ticket47783_2
_status_entry_both_server(topology, name=name, desc="chg rdn + delold=0 + same superior", debug=DEBUG_FLAG)
File "/389tests/ds/dirsrvtests/tickets/ticket47783_test.py", line 297, in _status_entry_both_server
assert ent_m1.dn == ent_m2.dn
AssertionError
}}}
This assertion failure means I could reproduce the bug? Thanks!

It is an URP issue (or issues :).

Master2
{{{
[18/Aug/2015:16:14:36 -0700] conn=3 op=25 MODRDN dn="cn=new_account1,cn=staged user,dc=example,dc=com" newrdn="cn=new_account1_modrdn" newsuperior="(null)"
[18/Aug/2015:16:14:36 -0700] conn=3 op=25 RESULT err=0 tag=109 nentries=0 etime=0 csn=55d3bc5d000000020000
...
[18/Aug/2015:16:14:37 -070}}}0] conn=6 op=3 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[18/Aug/2015:16:14:37 -0700] conn=6 op=3 RESULT err=0 tag=120 nentries=0 etime=0
[18/Aug/2015:16:14:37 -0700] conn=6 op=4 DEL dn="cn=new_account1,cn=staged user,dc=example,dc=com" <=== REPLICATED OP
[18/Aug/2015:16:14:37 -0700] conn=6 op=4 RESULT err=0 tag=107 nentries=0 etime=0 csn=55d3bc5d000000010000
}}}
Master1
{{{
[18/Aug/2015:16:14:36 -0700] conn=3 op=42 DEL dn="cn=new_account1,cn=staged user,dc=example,dc=com"
[18/Aug/2015:16:14:36 -0700] conn=3 op=42 RESULT err=0 tag=107 nentries=0 etime=0 csn=55d3bc5d000000010000
...
[18/Aug/2015:16:14:39 -0700] conn=6 op=5 MODRDN dn="cn=new_account1,cn=staged user,dc=example,dc=com" newrdn="cn=new_account1_modrdn" newsuperior="(null)" <=== REPLICATED OP
[18/Aug/2015:16:14:39 -0700] conn=6 op=5 RESULT err=0 tag=109 nentries=0 etime=0 csn=55d3bc5d000000020000
}}}
In the URP code, conflict modrdn is skipped if the target entry is already deleted.
{{{
268 /
269 * Return 0 for OK, -1 for Error, >0 for action code
270 * Action Code Bit 0: Fetch existing entry.
271 * Action Code Bit 1: Fetch parent entry.
272
/
273 int
274 urp_modrdn_operation( Slapi_PBlock pb )
275 {
...
377 slapi_log_error(SLAPI_LOG_FATAL, sessionid,
378 "urp_modrdn (%s): target entry is a tombstone.\n",
379 slapi_entry_get_dn_const(target_entry));
380 rc = SLAPI_PLUGIN_NOOP; /
Ignore the modrdn */
}}}
But in this test case, delete on M1 is done prior to modrdn on M2. In the case, delete on M1 should win? On M2, URP should undo the modrdn, then apply the delete? Probably, the answer is yes. On Master2, conn=6 op=4 DEL dn="cn=new_account1..." deletes the entry even if the DN is different since repl op could use nsuniqueid to identify the entry. Probably, we have to do compare the DN and adjust it to the one which wins...

But the issue could be larger than that. If multiple modifications are done on each masters separately then replication was resumed, what should we do?

For instance,
{{{
M1 M2
----------+----------
add entry->replicated
replication paused
add attr1
val1
add attr1
val1
mod attr1
val1'
del entry
mod attr1
val1"
add attr2
val2
rename to
newentry
replication resumed
----------+----------
}}}
In this case, what is the correct tombstone to be created on the masters?
{{{
dn: entry <== original, not newentry since it was renamed after deletion?
attr1: val1' <== last value before the deletion?
(no attr2 since it was added to M1 after the deletion on M2?)
}}}
If the replication is always enabled, then mod attr1: val" and the rest won't be accepted since the entry is already deleted. But if the replication resumed once all of the operations are done on each master, the order of the replay is not fixed. This requires the UPR code to traverse the history (maybe using the timestamps in the CSN?) and adjust with the current operation?

Hi Noriko,

IMHO, the order of replay is fixed and follow the CSN order (possibly down to subseq number).
In the described test case on M2
'attr1: val1"' and 'attr2: val2' are applied on the tombstone
then the modrdn should trigger a 'rename' of the tombstone. IMHO this looks a bit complexe to implement and I do not see much benefit as the entry is now a tombstone.

on M1 'attr1: val1' and 'attr1: val1'' are applied on the renamed entry, then the renamed entry is deleted.

That is right, the entry diverge on both server. But considering it is a corner case and there is no benefit (the entry is deleted), I think it is a minor issue and could be risky to fix.

Thank you for the input, Thierry. I'm pushing this to 1.3.6 backlog.

Metadata Update from @tbordaz:
- Issue set to the milestone: 1.3.6 backlog

7 years ago

Metadata Update from @mreynolds:
- Custom field rhbz reset (from 0)
- Issue set to the milestone: 1.3.7 backlog (was: 1.3.6 backlog)

6 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to None
- Issue set to the milestone: 1.4 backlog (was: 1.3.7 backlog)

4 years ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.4 (was: 1.4 backlog)

3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1115

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata