When testing IPA with the head of 1.3.2 strange entries can be created.
There exist entries like: dn: nsuniqueid=31db6ead-fc5911e3-b087920e-24b83048,fqdn=testhost1.idm.lab.eng.brq.redhat.com,cn=computers,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
but there is no objectclass: tombstone, so the entries remain visible and cannot be deleted. In the error log there are messages like:
[25/Jun/2014:13:10:34 +0200] - tombstone entry nsuniqueid=31db6ead-fc5911e3-b087920e-24b83048,fqdn=testhost1.idm.lab.eng.brq.redhat.com,cn=computers,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com failed to add to the cache: -1
With version 1.3.2.16 this problem does not exist
Replication is not enabled, the tombstone entries are created because the USN plugin is enabled.
The deletion fails because a tombstone entry should not be deleted by a client operation, but the detection of a tombstone in this case is based on the rdn (if it contains nsuniqueid)
Thank you, Ludwig. Let me take a look...
I cannot reproduce the problem... Here's what I did: Built a debug build from 389-ds-base-1.3.2 branch. Installed the standalone DS. Enabled USN. Added 3 entries: {{{ adding new entry uid=tuser0, o=my.com adding new entry uid=tuser1, o=my.com adding new entry uid=tuser2, o=my.com }}} and deleted uid=tuser1, o=my.com, which looks like this. (please note that "objectClass: nsTombstone" exists) {{{ dn: nsuniqueid=258b0a82-fd5a11e3-a1c8dc4c-747b35a8,uid=tuser1,o=my.com objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetOrgPerson objectClass: nsTombstone cn: A sn: B givenName: X uid: tuser1 userPassword:: e1NTSEF9M3Q1WDMra2hWYTZKci9QU0R1SmlueHRFSEl5ZTROZTYwUnVPdHc9PQ== nsParentUniqueId: 13f53300-fd5911e3-932af19a-6c36a85b entryusn: 21 }}} And there is no problem to delete it. {{{ $ ldapdelete [...] nsuniqueid=258b0a82-fd5a11e3-a1c8dc4c-747b35a8,uid=tuser1,o=my.com $ ldapsearch [...] -b o=my.com '(objectclass=nstombstone)' $ }}} There should be some more condition(s) to cause this bug...
Replying to [comment:3 nhosoi]:
There should be some more condition(s) to cause this bug...
Yes, there is more to trigger this.
Managed entry plugin has to be enabled The entry to be deleted has a "managedBy" reference to itself The delete txn has to be retried, so that the preop plugins are called a second time
and this is not enough The retro changelog has to be enabled There has to be another backend and changes have to be applied to both backends in parallel. If I run the attached scripts in a loop in parallel, I did get the error in 2/100 deletes
attachment ca-add-del.sh
attachment host-add-del.sh
Hi Ludwig, so far I have no luck to reproduce the problem. I used a build from master as well as 389-ds-base-1.3.2 branch, I went to your test system vm-111 and borrowed the contents, but I could not create a broken USN tombstone.
BTW, I could not find the broken USN tombstone. Has it been wiped out? E.g., the count of the tombstone DN and the count of the "objectClass: nsTombstone" are identical... (I exported the db with db2ldif -r ...) {{{
391 782 30889
391 782 9775
}}}
Replying to [comment:6 nhosoi]:
BTW, I could not find the broken USN tombstone. Has it been wiped out? E.g., the count of the tombstone DN and the count of the "objectClass: nsTombstone" are identical... (I exported the db with db2ldif -r ...)
That is another piece to the puzzle, the corrupted entry is not in the database - only in the entry cache. That also explains why after upgrade/downgrade of 389 the tests did start again, it was just due to the restart.
So we know we have an entry with dn=nsuniqueid and the original entry id in the entry cache and any further lookups will find this entry.
BTW I tested the error check relaxation fix, but the result is still the same
Thank you for testing the patch, Ludwig! And the fact -- problem is only in the memory -- you pointed out could be a very good clue. Continue working on it...
attachment 0001-add-entry-back-to-cache-if-it-was-replaced.patch
Noriko, the attached patch worked for me, can you have a look and explain why this code was removed ?
I removed them while I was debugging the crash in the entry cache, I think.
Yeah, it looks the code you pointed out is needed... But let me run my test again with the code and see there's no crash coming back...
Thanks, Ludwig! --noriko
Replying to [comment:9 lkrispen]:
Thank you so much, Ludwig. I applied your patch and ran the conflict test, which did not cause the crash. I'm running some more acceptance tests and push it to the branches. Since the bug was introduced by 47750, I'd like to reopen the bug and close this bug as duplicate of 47750 to make the patch maintenance easier. Is it okay?
Mark as duplicate of #47750.
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.3.2.18
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1161
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Duplicate)
Login to comment on this ticket.