#47964 Incorrect search result after replacing an empty attribute in replicated environment
Closed: wontfix None Opened 9 years ago by pj101.

We are using 389ds v1.3.2.24 in multi-master replicated triangle topology (3 servers, each MMR to 2 others).

The bug is easily reproduced: take an entry that does not have an attribute ("seeAlso", for example) and apply the following mod :

version: 1

#dn: uid=andrey.ivanov,ou=Personnel,ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: seeAlso
-

Now make the search with the filter
{{{'(&(uid=andrey.ivanov)(seeAlso=*))'}}}. The entry will be returned while it should not be - the attribute "seeAlso" does not exist in it.

The search {{{'(seeAlso=*)'}}} does not return the entry since it uses the index seeAlso.db which is correctly maintained (the id of the entry is not in seeAlso.db).

dbscan -f id2entry.db for the concerned entry:

id 7736
        rdn: uid=andrey.ivanov
        entryusn;adcsn-5472f31e000000010001;vucsn-5472f31e000000010001: 626828
...
        seeAlso;deleted:
        createTimestamp: 20100315161506Z
...

The problem does not exist without replication (the line "seeAlso;deleted:" does not appear in that case). If i put some non-empty value into the attribute "seeAlso" and then re-apply the abovementioned mod the bug disappears since the corresponding line in id2entry changes to something like

seeAlso;vucsn-5472ea37000000010000;vdcsn-5472ea40000000010000;deletedattribute;deleted: <previous value>

I've tested it with other attributes: businessCategory, physicalDeliveryOfficeName etc, both indexed and unindexed. The only general thing is:

  • replication
  • replacing an attribute without a value with the mod above

Export/import using pure ldif without replication state of course does reolve the problem.

it's a rather critical issue for us since it gives us false positives for various group updates. The entries that should not be returned are returned.


in replication we try to track any delete of an attribute, to have the most correct update resolution. So this is done even for a delete of a non-existing attribute since there could be the following scenario:
seeAlso does not exist
on M1 add seealso value
and concurrently (just before it was replicated to m2):
on M2 delete the no existent seeAlso attr.

If these operations are then replicated on M1 the now existent attr would be removed and on M2 the attr would exist.
So the attempt is to keep track of all operations, unfortunately it looks like the deletion of the non-existing attr is not correctly tracked: a deleted NULL value instead of a deleted attr.

As a consequence the attr is not seen ina search, but if used in a presence filter it is incorrectly used

Although the fix seems correct, it also seems too simple. This is a very sensitive area of the code and a fix such as this can introduce subtle regressions. How can we thoroughly test this to make sure we don't introduce any regressions?

the code change effects only one scenario,
the attribute does not exist and no values are provided.
If the attr exists or is deleted it will not get into this code path.
If values are provided then the code was already executed.

The result is a state, which we did handle in migrations from SunDS (bug 1063990), where we exactly had: svattr;adcsn-XXXX;vdcsn-XXXX;deletedattribute;deleted:

and we could not create this, but fixed replication to work in this situation

Replying to [comment:6 lkrispen]:

the code change effects only one scenario,
the attribute does not exist and no values are provided.
If the attr exists or is deleted it will not get into this code path.
If values are provided then the code was already executed.

The result is a state, which we did handle in migrations from SunDS (bug 1063990), where we exactly had: svattr;adcsn-XXXX;vdcsn-XXXX;deletedattribute;deleted:

and we could not create this, but fixed replication to work in this situation

Ok.

I've patched the entrywsi.c file - the fix does work. After the applying the incriminated modification the id2entry of both replicated servers contains

{{{
seeAlso;adcsn-54784a85000200010000;vdcsn-54784a85000200010000;deletedattribute;deleted:
}}}

and the corresponding search does not return the entry.

Finally, this patch does fix the original problem but there is a side effect - the crashes of 389ds at certain conditions. I've managed to reproduce the crash with this patch. It happens in the following conditions: replication should be enabled (without replication everything is fine) and a sufficiently large ldif modify should be performed on an entry with a lot if "replace by nothing" mods (deleting an already empty attribute).
I've checked out the latest GIT for the version 1.3.2, applied this patch (one-line diff) and ran the test. Without the patch the MOD is ok, with the patch applied 389ds crashes.
If the MOD is small enough, everything is fine. At some critical size (or number of "replace" in ldif) the server starts to crash.
I've attached the ldif MOD generating the crash and the stack trace extracted from the core according to http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes

The message in error log:
[08/Jan/2015:10:12:28 +0100] - entry2str_internal_ext: array boundary wrote: bufsize=17165 wrote=17265

ldif with MODs to crash the patched 389ds
large-mod.ldif

Patched server postmortem core stacktrace
stacktrace-crash.txt

I will look into this, to make life more easy, could you provide the schema files used to create/modify the entries and a sample entry to apply the mods to ? thanks, Ludwig

Entry to modify in order to reproduce crash
entry-to-modify.ldif

Ok. It's done. Schema and the entry are attched. I don't think it has anything to do with schema. It could be related to attribute options - we use them a lot. However, the number of mods in one operation matters - for other users with a smaller number of modifications everything is fine.

I agree, it has nothing to do with schema, it just lets me apply your test mod without problems.
The problem is probably that we now write more data "deletedattribute;deleted" and the function calculating the buffersize before writing does not take this into account and allocation is too small.

reproduced and fixed with the attached patch, could you verify it works for you

The reason was that the space for "deletedattribute;deleted" was only allocated if the attribute has values, but the dummy value was added after this calculation.

It looks like the existing space calculation is already allocating more than actual required, so it becomes only visible if there are many of these empty attributes

Yes, it works for me, i've updated our production servers. Thank you, Ludwig!

committed:
master:
ldap/servers/slapd/entry.c | 2 ++
ldap/servers/slapd/entrywsi.c | 8 ++++++--
2 files changed, 8 insertions(+), 2 deletions(-)

New commits:
commit 3182d2e

1.3.3:
ldap/servers/slapd/entry.c | 2 ++
ldap/servers/slapd/entrywsi.c | 8 ++++++--
2 files changed, 8 insertions(+), 2 deletions(-)

New commits:
commit 391539a

1.3.2:
ldap/servers/slapd/entry.c | 2 ++
ldap/servers/slapd/entrywsi.c | 8 ++++++--
2 files changed, 8 insertions(+), 2 deletions(-)

New commits:
commit a379565

git patch file (1.2.11) -- applying the patch in master just to entry2str_internal_size_attrlist (entry.c).
0001-Ticket-47964-1.2.11-Incorrect-search-result-after-re.patch

Pushed to 389-ds-base-1.2.11:
96b3a5b..d1cf785 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit d1cf785

dumped raw entry which crashes the server (size = 1618188B)
huge_entry_snippet.txt

I think the data in comment #5 do not match this bug. for the lastlogintime accumulation we have ticket #48944.
We should close this one again as fixed

Replying to [comment:24 lkrispen]:

I think the data in comment #5 do not match this bug. for the lastlogintime accumulation we have ticket #48944.
We should close this one again as fixed

I meant comment #22 (not #5).

Ahhh, sorry, Ludwig! It seems I made a mistake (but I don't know why I got confused...) :(

Closing this bug again with the original bug reference.

Metadata Update from @nhosoi:
- Issue assigned to lkrispen
- Issue set to the milestone: 1.2.11.33

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1295

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata