This bug was discovered in IPA upgrade 3.3 -> 4.1 (https://bugzilla.redhat.com/show_bug.cgi?id=1176995)
The problem is that when a supplier or consumer learns a superset definition it uses a MOD-ADD (see modify_schema_prepare_mods). But if the definition already existed, the internal modify will fail with LDAP_TYPE_OR_VALUE_EXISTS.
[16/Jan/2015:05:35:08 -0500] conn=Internal op=-1 MOD dn="cn=schema" [16/Jan/2015:05:35:08 -0500] conn=Internal op=-1 RESULT err=20 tag=48 nentries=0 etime=0
In fact it should user MOD-DEL (old definition) + MOD-ADD (new definition).
A condition for this bug to create a replication issue is that the schema that is the superset of the other must have a 'nsschemaCSN' that is lower than the other schema
To reproduce, Create a two suppliers S1 and S2. Stop S2 and update S1 schema with:
( 2.16.840.1.113730.3.8xxxx NAME 'oc1' AUXILIARY MUST cn MAY ( description $ roomnumber ) X-ORIGIN ( 'Test' 'user defined' ) )
Stop S1 and update S2 schema with
( 2.16.840.1.113730.3.8xxxx NAME 'oc1' AUXILIARY MUST cn MAY ( description ) X-ORIGIN ( 'Test' 'user defined' ) )
Start S1 and S2. Do an update on any entry on S2.
So S1 schema > S2 schema, but the nsschemaCSN on S2 > S1.
So S2 will retrieve S1 schema, will detect that 'oc1' needs to be learned and will try to update its schema with this definition
attachment 0001-Ticket-47988-Schema-learning-mechanism-in-replicatio.patch
attachment 0001-Ticket-47988-test-case.patch
The patch looks good to me. But I have 2 questions. :) 1) Did you run valgrind with your test? I'm especially curious about this +128 bytes, which ensures it's always enough... 6341 schema_oc_to_string(struct objclass oc) 6367 size += 128; / for all keywords: NAME, DESC, SUP... */ Plus, in modify_schema_prepare_mods, you pass in object->old_value and object->new_value to berval and free bvps_del and bvps_add arrays. That looks correct. I just want to have the confirmation the function does not miss anything. :)
2) You switch the behaviour whether the op is internal or not. The "internal" is meant to be a "replicated" operation? I wonder if it could be an OP_FLAG_REPLICATED flag? (Or it behaves totally different?) My concern is the notion of "internal" sounds broader than what we are trying to solve. In other words, if we could say there is no "internal" schema delete operation other than this schema replication, I have no objection.
Replying to [comment:4 nhosoi]:
Noriko, thank you so much for having look at that fix. You are right, I remember wanting to improve this but finally did not make it.
I did not run with valgrind and I will do it tomorrow. Now it depends how complex is the definition. This size can be computed with the sum of all the possible keywords supported by the function, but extra room could prevent a heap corruption.
This does not happen during a replicated operation. The replica agreements reads the remote schema, compares it with the local schema and if some definitions need to be updated (learn) it does an internal operation. I may create a new flag if internal operation is not restrictive enough.
Regaring the internal flag, if it is exactly what you are looking for, no need to create a new flag.
Thank you for answering my questions, Thierry.
Running the test case with valgrind does not raise any heap corruption. I had to adapt the replication timeout because running with valgrind slow down significantly the servers. I will do some slight changes around the string size (compute the size of the fixed sized keywords) and commit the change.
Thanks you again Noriko for this review :-)
Thank you for running more tests, Thierry! I'm so relieved to hear valgrind did not report any memory issues.
Please go ahead and push the patch to master and 1.3.3!
Pushed to master on behalf of Thierry. d1f75af..51e05df master -> master commit 51e05df
Pushed to 389-ds-base-1.3.3: 6431142..1073168 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 1073168
git patch file (master) -- fix for covscan defects 0001-Ticket-47988-Schema-learning-mechanism-in-replicatio.2.patch
Unfortunately, in schema_delete_objectclasses and schema_delete_attributes, ((mod->mod_bvalues == NULL) && is_internal_operation) case does not do anything but return SUCCESS (instead of UNWILLING_TO_PERFORM)? Is it expected?
Hello Noriko,
Sorry for this final rush, I was hoping to clear/push on monday but I understand it was not possible. You are right this part of the fix is problematic. Removing all values attributeypes/objectclasses should return an UNWILLING TO PERFORM even in case of internal op.
Hopefully this is something that 'learning schema mechanism' is not doing, it just add and possibly delete specific values.
At this time I do not know how to handle this. Should I clear this part of the fix and push again or is it too late ?
Push of the test case
Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (9/9), done. Writing objects: 100% (9/9), 186.08 KiB, done. Total 9 (delta 2), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git cfa8e4d..868ac8b master -> master
attachment 0001-Ticket-47988-follow-up.patch
Push testcase in 1.3.3
git push origin 389-ds-base-1.3.3 Counting objects: 13, done. Delta compression using up to 4 threads. Compressing objects: 100% (9/9), done. Writing objects: 100% (9/9), 186.87 KiB, done. Total 9 (delta 3), reused 1 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 1bf51f8..50f3f5a 389-ds-base-1.3.3 -> 389-ds-base-1.3.3
Metadata Update from @tbordaz: - Issue assigned to tbordaz - Issue set to the milestone: 1.3.3 backlog
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1319
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.