#242 Incorrect entryUSN index under high load in replicated environment
Closed: wontfix None Opened 10 years ago by rmeggins.

https://bugzilla.redhat.com/show_bug.cgi?id=745259

User-Agent:       Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101
Firefox/7.0.1

The entryusn.db4 index is not maintained correctly when a large number of
replicated operations arrive almost simultaneously. As a result, several
entryusn numbers in the index may correspond to one entryid and the searches on
entryusn return incorrect results.

Manual reindexing on entryusn resolves the problem.

Reproducible: Always

Steps to Reproduce:
I will describe our configuration where it is reproducible every time.

1. Configure three 389ds servers (in our case, it's 1.2.9.8 version with the
additional bugfix patch for the bug 735121).

2. Configure multimaster replication from each server to two others. While
configuring replicas, exclude entryusn and memberOf from replication.

Important: do not disable the replication of modifyTimestamp or modifiersName.
It's thanks to these attributes that a high load may be produced in this
configuration.

3. Create a sufficiently large groups with nested subgroups (~200 entries per
group is usually sufficient).

4. In a single (it is important!) operation make a replace or add/delete of all
or many uniqueMember attribute values of the group. It will generate the
following events :

* an avalanche of changes of the memberOf and entryusn attributes in a lot of
entries. These changes are not replicated, so it's ok.

* each changed entry has its modifyTimestamp and modifiersName attribute
changed at first directly and then by memberOf plugin.

* the changes are replicated to two other replicas. On these replicas memberOf
replaces the corresponding attributes and once again change modifyTimestamp and
modifiersName.

* As a result for each change we have two "returns to the sender" from two
replicas because of modifyTimestamp and modifiersName. Here is the typical ldap
audit log example :

time: 20111011210009
dn: cn=groupe dsi restreint,ou=par entite,ou=groupes
globaux,ou=groupes,dc=id,dc=polytechnique,dc=edu
changetype: modify
add: uniqueMember
uniqueMember: uid=andrey.ivanov,ou=Personnel,ou=Utilisateurs,dc=id,dc=polytech
 nique,dc=edu
-
replace: modifiersname
modifiersname: cn=X LDAP Root
-
replace: modifytimestamp
modifytimestamp: 20111011190009Z
-
replace: entryusn
entryusn: 69538
-

time: 20111011210011
dn:
uid=andrey.ivanov,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011190011Z
-
replace: entryusn
entryusn: 69542
-

time: 20111011210015
dn:
uid=andrey.ivanov,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011190016Z
-
replace: entryusn
entryusn: 69545
-



So, to resume, one change of the first replica generates two "returned"
replicated operations. With a sufficiently large initial group modification we
have a sufficiently large number of changes arriving at the same moment.
Actual Results:
After this operation the examining of entryusn index shows a large site of
duplicates. Here is an example :

dbscan -r -f entryusn.db4|less

=69294
        649
=69357
        649
=69387
        649
=69490
        649
=69511
        649

The entry with id 649 (in id2entry.db4) contains 69511 (on two other replicas
entryusn for this entry is 69891 and 69780, so the replication of entryusn is
out of game).

The number of duplicates can easily be found by comparing the number of all
entries and unique entries :

dbscan -r -f entryusn.db4| grep -v ^= | sort |wc
   5232   12074   71602

dbscan -r -f entryusn.db4| grep -v ^= | sort |uniq|wc
   5109   11951   70820

The searches of entryUSN=smth and entryUSN>=smth, naturally, do not return the
expected results since the index is corrupted.


Expected Results:
I am unable te reproduce the problem without replication or by changing just
one single independent attribute (like 'description').
I think this problem has to do something with heavy internal updates (memberOf
plugin) and a lot of replicated entries arriving simultaneously.

Reindexing recreates the index correctly:

db2index.pl -v -D "cn=Directory Manager" -w - -n userRoot -t entryusn
...

dbscan -r -f entryusn.db4| grep -v ^= | sort |wc
   5109   11951   70820

dbscan -r -f entryusn.db4| grep -v ^= | sort |uniq|wc
   5109   11951   70820

commit changeset:a360ab7/389-ds-base
Author: Noriko Hosoi nhosoi@redhat.com
Date: Tue Nov 29 13:31:39 2011 -0800
Fix Description: When replication conflicts occur in the replace
op and new entry still contains the old value, then the old value
won't be removed from the index file. Also, if the new value is
not added to the entry, the new value won't be added to the index
file.

The failure is informed to entryusn by by setting LDAP_MOD_IGNORE
in the mod.  Entryusn plugin bepostop_modify uses the info to
determine to increment the entryusn or not.

Added initial screened field value.

Metadata Update from @nkinder:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.10

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/242

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

2 years ago

Login to comment on this ticket.

Metadata