#48944 on a read only replica invalid state info can accumulate
Closed: wontfix 3 years ago Opened 4 years ago by lkrispen.

In a master-master-consumer topology where the account policy plugin is enabled on all servers with always record lastlogin time itcn happe in specific scenarios that on the consumer invalid/ unneeded state information accumulates like:

nscpentrywsi: lastLoginTime;adcsn-579b491a000100c80000;vdcsn-579b491a000100c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;adcsn-579b48ff000100c80000;vdcsn-579b48ff000100c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;vdcsn-579b48dd000100c80000;deletedattribute;deleted:

This can be reproduced by the following steps:

Stop M1,M2,C1
Start M1
bind user1 on M1, lastlogin will be updated with csn time T1
Stop M1
Start M2
bind user1 on M2, lastlogin will be updated with csn time T2>T1
Stop M2

Start C1
Start M2 to ensure the change with  T2 is replicated earlier than T1
we now have this change properly updated

nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000;vucsn-5799e628000000c80000: 20160728110200Z
nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted:

Now bind user1 to C1, for a readonly replica no csn is generated, the value is replaced and the adcsn is kept.

nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000: 20160728110449Z
nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted:

Now start M1 to receive the older csn, and the weirdness starts:
nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000;vdcsn-5799e628000000c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted:
there is no more proper lastlogintimevalue !!!

Bind on read only replica again:
nscpentrywsi: lastLoginTime: 20160728110525Z
nscpentrywsi: lastLoginTime;adcsn-5799e628000000c80000;vdcsn-5799e628000000c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;adcsn-5799e4f0000000c80000;vdcsn-5799e4f0000000c80000;deletedattribute;deleted:
nscpentrywsi: lastLoginTime;vdcsn-5799d191002c00c80000;deletedattribute;deleted:

There is a new value without any csn, the deleted are still there

Bind to M1 again and there will be a normal value plus deleted,

The core problem is that if the BIND is done on the read only replica, the lastlogintime is updated on the replica and it does not get a csn assigned, So the entry is updated without entry state resolution and an existing adscn is preserved, but the newly added value does not have a vdcsn.

When the next update for this attribute from a master is received it depends on the comparison of the csn of the update to the existing adcsn. If it is older, entry state resolution fails.

How to fix this ?

One option would be to assign a csn to local mods with replica ID ffff, then entry state resoultion should work properly, but it would have to be investigated if the presence of csns with the readonly replica id would cause problems

The other option is to fix entry state resolution for this specific scenario. This would probably resolve the current problem, but would not address the general problem of updates on readonly replicas

We have this same problem with password policy attributes. How do we handle that case?

Another option is - only support read-only replicas if using chain on bind or update - http://www.port389.org/docs/389ds/howto/howto-chainonupdate.html

Target milestones: 1.3.5.13 and 1.2.11.33.

Pushed to 1.3.6.0 and 1.2.11.33.

I have another scenario to reproduce this, and did generate more accumulated values than with the manual steps.

Create a two master, one consumer topology
Enable account policy plugin, always write last login
Pick a set of users and create a binddn file for ldclt, eg

{{{
cn=Abbate,ou=people,dc=example,dc=com password
cn=Abbate,ou=people,dc=example,dc=com password
cn=Abbate,ou=people,dc=example,dc=com password
cn=Abbate,ou=people,dc=example,dc=com password
cn=Abbie,ou=people,dc=example,dc=com password
cn=Abbie,ou=people,dc=example,dc=com password
cn=Abbie,ou=people,dc=example,dc=com password
cn=Abbie,ou=people,dc=example,dc=com password
cn=Abbott,ou=people,dc=example,dc=com password
cn=Abbott,ou=people,dc=example,dc=com password
cn=Abbott,ou=people,dc=example,dc=com password
cn=Abbott,ou=people,dc=example,dc=com password
}}}
list the same dn several times to increase the chance of concurrent binds with the same dn. (not sure if this is really needed).

run an ldclt load test against all three instances.

{{{
ldclt -h localhost -p 4011 -e "bindeach,bindonly" -e randombinddnfromfile=bindfile4s.txt -n 10 -W 1 -e esearch -f "objectclass=organizationalunit"
}}}

after some time stop the load and check last login time, it will produce output like:

{{{
================================== cn=Adeney ==============================
nscpentrywsi: lastLoginTime;adcsn-57d27802000100030000: 20160909085120Z
nscpentrywsi: lastLoginTime;adcsn-57d2752f001500020000;vdcsn-57d2752f001500020
nscpentrywsi: lastLoginTime;adcsn-57d2706f001100030000;vdcsn-57d2706f001100030
================================== cn=Aderhold ==============================
nscpentrywsi: lastLoginTime;adcsn-57d27800000800030000: 20160909085124Z
================================== cn=Adey ==============================
nscpentrywsi: lastLoginTime;adcsn-57d27806001f00020000;vucsn-57d27806001f00020
nscpentrywsi: lastLoginTime;adcsn-57d2776f000400020000;vdcsn-57d2776f000400020
nscpentrywsi: lastLoginTime;adcsn-57d2766b001400020000;vdcsn-57d2766b001400020
nscpentrywsi: lastLoginTime;adcsn-57d27669001200020000;vdcsn-57d27669001200020
nscpentrywsi: lastLoginTime;adcsn-57d27542002600020000;vdcsn-57d27542002600020
nscpentrywsi: lastLoginTime;adcsn-57d274ec000000020000;vdcsn-57d274ec000000020
nscpentrywsi: lastLoginTime;adcsn-57d274bf000f00020000;vdcsn-57d274bf000f00020
nscpentrywsi: lastLoginTime;adcsn-57d26f17001c00030000;vdcsn-57d26f17001c00030
================================== cn=Adhem ==============================
nscpentrywsi: lastLoginTime;adcsn-57d27806001d00020000: 20160909085124Z
nscpentrywsi: lastLoginTime;adcsn-57d275bd001c00030000;vdcsn-57d275bd001c00030
}}}

the attached patch is for master, code is different from 1.2.11.

it stops the accumulation of invalid state, but does not remove existing invalid data - this would have to be cleaned by import/export

Replying to [comment:8 lkrispen]:

it stops the accumulation of invalid state, but does not remove existing invalid data - this would have to be cleaned by import/export

after the purgedelay is passed the entry is cleaned up

Ludwig, your fix looks good to me.

backport to 1.2.11 attached

Thank you, Ludwig! You have my ack.

backport to 1.2.11 attached

Could you please push it to the 389-ds-base-1.2.11 branch?

Hi Ludwig,

Is it ok to close this ticket as fixed? (hint hint :)

Thanks!

Metadata Update from @nhosoi:
- Issue assigned to lkrispen
- Issue set to the milestone: 1.3.6.0

3 years ago

Metadata Update from @sramling:
- Custom field reviewstatus adjusted to review (was: ack)
- Issue close_status updated to: None

3 years ago

Adding test script to test this from Pytest.

Thanks for the test case, it looks good and it passes.
A few thing we can improve, so it will be easier to others to understand what's going on.

  1. I think it will be better 'as topo' here:

    from lib389.topologies import topology_m2c2 as topo
    
  2. Please, use 'logging' instead of writting comments:

    # Checking if entries are synced across masters and cosumers 
    log.info('Checking if entries are synced across masters and cosumers') # you have typo here, BTW
    
  3. Use one standard for your variables. You have plenty of formats - n1c1entry, m2nlastLogin, nm2entry, c2n3lastLogin. All these 'n1', 'n2', 'n' are really confusing.
    I think, when you do a search, it is okay to use just 'entry', there is no need to make it unique.
    And for 'lastLoginTime' you can use something like this:

    entry = topo.cs['consumer1'].search_s(userdn, ldap.SCOPE_BASE, 'objectClass=*', ['lastLoginTime'])
    lastLogin_c1_4 = entry[0].lastLoginTime
    

    Also, you can wrap these two lines into one function and remove some redundant code.

Thanks for the test case, it looks good and it passes.
A few thing we can improve, so it will be easier to others to understand what's going on.

Sure

I think it will be better 'as topo' here:
from lib389.topologies import topology_m2c2 as topo

this is how other replication tests are written. So, thought will stick in to that format. In fact, I initially wrote as topo.

Please, use 'logging' instead of writting comments:

Checking if entries are synced across masters and cosumers

log.info('Checking if entries are synced across masters and cosumers') # you have typo here, BTW
Sure, will fix that

Use one standard for your variables. You have plenty of formats - n1c1entry, m2nlastLogin, nm2entry, c2n3lastLogin. All these 'n1', 'n2', 'n' are really confusing.
I think, when you do a search, it is okay to use just 'entry', there is no need to make it unique.
And for 'lastLoginTime' you can use something like this:
entry = topo.cs['consumer1'].search_s(userdn, ldap.SCOPE_BASE, 'objectClass=*', ['lastLoginTime'])
lastLogin_c1_4 = entry[0].lastLoginTime
Sure, I will make changes.

Also, you can wrap these two lines into one function and remove some redundant code.
Sure, will do.

Metadata Update from @spichugi:
- Custom field reviewstatus adjusted to ack (was: review)

3 years ago

3cf8563..e1dca08 master -> master
commit e1dca08
Author: Sankar Ramalingam sramling@redhat.com
Date: Thu Jun 8 20:36:59 2017 +0530

Metadata Update from @spichugi:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2003

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

2 months ago

Login to comment on this ticket.

Metadata