Steps to reproduce.
1) set up MMR 2) DIT: dc=example,dc=com uid=user,dc=example,dc=com 3) Add an OU ou=OU,dc=example,dc=com 4) Move the user on master1 ldapmodify ... << EOF changetype: modrdn newrdn: uid=user deleteoldrdn: 0 newsuperior: ou=OU,dc=example,dc=com EOF 5) Do online init on master1 master2/error log shows: [..] - import userRoot: WARNING: Skipping entry "uid=user,ou=OU,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)" [..] - import userRoot: WARNING: bad entry: ID 25
This is because the parent (ou=OU) is located after the child (uid=user).
Export (db2ldif) can handle the case correctly, but online init is not.
Workaroud:
1) shutdown the servers 2) run db2ldif on master1 db2ldif -n <backend> -r 3) import it to master2 ldif2db -n <backend> -i /path/to/the/ldiffile
Why would we make this configurable? If this is a bug during total init, wouldn't we wanted the sorted init to always be run? I think that there is no need for the configuration because leaving the current behaviour on can break things ....
Additionally, wouldn't this fix actually remove the need for the silly CSN behaviour where we have to after a total init set the consumer CSN low, and replay all the changelog even though most events have been applied? I think that previously we would have to set the CSN low to replay modrdn's like this, but with the sorted total init that removes that behaviour .....
Replying to [comment:6 firstyear]:
That is because the current implementation does not need to generate a IDlist. If your database is huge and all the IDs are in the ascendant order, it'd be an overhead for no benefit.
I don't think so. This sorted "initialization" just guarantees all the entries are sent to a consumer and nothing else...
Replying to [comment:7 nhosoi]:
Replying to [comment:6 firstyear]: Why would we make this configurable? If this is a bug during total init, wouldn't we wanted the sorted init to always be run? I think that there is no need for the configuration because leaving the current behaviour on can break things .... That is because the current implementation does not need to generate a IDlist. If your database is huge and all the IDs are in the ascendant order, it'd be an overhead for no benefit.
This is during an online init though. So already we have a huge impact. I would rather see us value correctness of what we send over performance and edge cases.
Additionally, wouldn't this fix actually remove the need for the silly CSN behaviour where we have to after a total init set the consumer CSN low, and replay all the changelog even though most events have been applied? I think that previously we would have to set the CSN low to replay modrdn's like this, but with the sorted total init that removes that behaviour ..... I don't think so. This sorted "initialization" just guarantees all the entries are sent to a consumer and nothing else...
Well at the moment if we have the scenario where you have modrdn on a entry that would be skipped.
You send the complete init from master A to consumer B. The entry that was modrdn would have to be skipped .... which means it's not on consumer B.
Then the CSN of master A -> consumer B is kept low. So then the next replication session, the changelog is replayed. Most changes are skipped (which is wasteful in itself!) but then the entry that was moved is now added to consumer B.
Basically, what this means is that currently a total init, is not actually total, but more "partial to a point". This could be really bad if the modrdn operation had been trimmed out of the changelog also and could represent a loss of data on consumer B.
Where as in your new change, in this scenario, because of the ordering, the modrdn doesn't matter. After the total init the contents of consumer B are "correct" as per master A. This means we could set the CSN of consumer B to be "high" (ie matching master A). This would minimise, or eliminate the need to replay the log to the consumer to basically fix up what was missed.
However I may be missing something important in my mind about how this works. I would like to see Mark or Ludwig comment on this.
But if this works the way I think it does, if we made this the default, we could eliminate (and simplify) many parts of replication because we would be eliminating some of these edge cases!
I agree with William that we should do it always right, no requirement to configure. I agree with Noriko that the current proposed fix looks too costly to do it always.
Why don't we do it like ldbm2ldif does it, which I think has very low overhead:
So, my suggestion: do it like 1], it works already for db2ldif, it doesn't seem to have much overhead
There is also a third option: 3] handle the issue on the consumer side. If an entry is received without parent don't return an error, create a glue entry and replace when parent is received.
But I think 1] should be the least overhead
Replying to [comment:10 lkrispen]:
I agree with William that we should do it always right, no requirement to configure. I agree with Noriko that the current proposed fix looks too costly to do it always. Why don't we do it like ldbm2ldif does it, which I think has very low overhead:
The implementation of the online init is cleverly done using the callback send_entry via slapi_search_internal_callback_pb. I did not want to change the framework...
Iterate thru the id2entry file, without need of an idlist for each entry check if parentid < entryid if parentid < entryid the parent was already sent, send the entry if not, there are two options: 1] - what ldbm2ldif does: get the parent entry and send it, make a not that this entry was sent, continue 2] - my spontaneous idea: stack the entry, continue, later process the stack (but this may be more complicated since a parent now could be in the stack) 3] handle the issue on the consumer side. If an entry is received without parent don't return an error, create a glue entry and replace when parent is received. So, my suggestion: do it like 1], it works already for db2ldif, it doesn't seem to have much overhead
You suggest to expose the ldbm2ldif code to the plugin? It may not be so clean... :)
I think that even if there is an overhead to sorting, it seems like the simpler solution than having a second stack that we need to post process out. The risk with the stack is what happens if we have un-ordered entries there too? Don't we need to iterate over it multiple times then? How do we track this?
I think pre-sorting is the simple and correct solution. Online init is already "costly". Paying extra is no matter IMO.
From Noriko's email:
{{{ Now, I'm wondering... the extra IDList is (4 byte ID * entry counts) per DB instance. If you have 5 million entries, it'd add the IDlist which size is about 20MB. Thinking of the typical physical memory size these days, it may not be that bad at all... (If you have plenty of entries, you'd have large memory, any way...) That said, I could just remove the config parameter nsds5SortedTotalUpdate and run it in the sorted manner if entryrdn_get_switch() returns true as follows. Do you think it's acceptable?
diff --git a/ldap/servers/plugins/replication/repl5_tot_protocol.c b/ldap/servers/plugins/replication/repl5_tot_protocol.c
index 9218a5a..3a0c71a 100644 --- a/ldap/servers/plugins/replication/repl5_tot_protocol.c +++ b/ldap/servers/plugins/replication/repl5_tot_protocol.c @@ -418,7 +418,7 @@ retry: pb = slapi_pblock_new (); replica = (Replica*) object_get_data(prp->replica_object); - if (replica && replica_get_sorted_total_update(replica)) { + if (entryrdn_get_switch()) { /* * Supporting entries out of order -- parent could have a larger id than its children. * Entires are retireved sorted by parentid without the allid threshold.
Also, I'm going to run the upgrade script to reindex parentid only if it is sorted with the alphabetical order. I'm adding the check now. }}}
If the over head is 20MB, that is nothing on a modern system. Even sorting 20MB of data on modern cpu's is fast.
With this part:
{{{ + if (entryrdn_get_switch()) { }}}
This is defined by nsslapd-subtree-rename-switch. I do not like depending on this.
Consider I have a database where I:
Now my list needs sorting, but it won't be because I toggled that switch. It seems contrived, but this scenario would happen over the course of say months, if not years. Someone changes the subtree-rename, then a few months later can't solve why online init failed.
I think that we shouldn't depend on this config either sorry.
Replying to [comment:13 firstyear]:
Consider I have a database where I: nsslapd-subtree-rename-switch: on move a bunch of entries nsslapd-subtree-rename-switch: off online init -> some consumer.
Agh... I wish it is that simple... :)
You cannot do this. I mean if you want to, you need the extra steps in between.
nsslapd-subtree-rename-switch: on move a bunch of entries nsslapd-subtree-rename-switch: off
At this moment, the DB is already broken for the server... You have to do db2ldif, then ldif2db, which straights out the ancestor - descendant relationship.
Then, you can do this.
online init -> some consumer.
git patch file (master) -- eliminated a config param nsds5SortedTotalUpdate 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.2.patch
git patch file (master) -- CI test; adjusted to the .2 patch 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.2.patch
nsslapd-subtree-rename-switch: on move a bunch of entries nsslapd-subtree-rename-switch: off At this moment, the DB is already broken for the server... You have to do db2ldif, then ldif2db, which straights out the ancestor - descendant relationship.
Right, that makes sense.
I've built and tested this code. At a read, it looks good. You have my ack. Do you want to get Ludwig to check also?
Copy & paste the email discussion for the record. On 06/07/2016 02:00 AM, Ludwig Krispenz wrote:
On 06/06/2016 06:34 PM, Noriko Hosoi wrote: On 06/06/2016 02:41 AM, William Brown wrote: On Mon, 2016-06-06 at 11:07 +0200, Ludwig Krispenz wrote: Sorry if I start the discussion again, Noriko pointed out that my suggestion to do it like db2ldif doesn not work since the callback to send entries doesn't have enough context. But the current proposal seems to be a large overhead wich is not needed in most cases. So let me repeat my poroposal 3] from the ticket: Let the consumer handle this, for a replicted operation we always assume that the operation was valid on the supplier and accept it, so if we receive an entry without parent accept it and create a glue entry, if we later receive the real entry replace the glue entry. I feel like this is a more complex aproach to a simple problem. Noriko's solution works, and is simpler to reason about and consider. It is a correct behaviour to send the list sorted, so lets try to maintain stricter, and more correct behaviours. In most cases our list will generally be sorted, or near to it anyway. I think that given we are doing a full re-init, we already have a big overhead and hit. We shouldn't be arguing over overheads here, because this is the equivalent to taking a sledgehammer to the remote DS anyway. Additionally, on the consumer recieving the incoming update, we are going to end up doing more work in replication, and it will be much harder to debug. I also think that the simple solution is a good idea, because premature optimisation causes us more issues. Replication is complicated as hell already. Lets not make it more complex. I would like to stick with Noriko's patch to sort the list. Thank you, Ludwig and William, for your comments. The memory overhead would be sizeof(ID) * number of entries in a backend. If it has 10 million entries, it'd be ~38.15MB. I'd think it is acceptable now a days. The sorting cost is a bit reduced since I set parentid index to integerOrderingMatch and if there is no rename history which causes this out-of-order issue, it is already sorted. To implement your proposal, we need to change the tasks done on the consumer side. Currently, the consumer uses the import. Instead, it has to dump the received data into the id2entry.db, then call db2index all (actually, it is internally upgradedb)... In theory, it'd be doable, I think. Only an issue is time for now. :) ok, go ahead with your solution.
On 06/06/2016 06:34 PM, Noriko Hosoi wrote:
On 06/06/2016 02:41 AM, William Brown wrote: On Mon, 2016-06-06 at 11:07 +0200, Ludwig Krispenz wrote: Sorry if I start the discussion again, Noriko pointed out that my suggestion to do it like db2ldif doesn not work since the callback to send entries doesn't have enough context. But the current proposal seems to be a large overhead wich is not needed in most cases. So let me repeat my poroposal 3] from the ticket: Let the consumer handle this, for a replicted operation we always assume that the operation was valid on the supplier and accept it, so if we receive an entry without parent accept it and create a glue entry, if we later receive the real entry replace the glue entry. I feel like this is a more complex aproach to a simple problem. Noriko's solution works, and is simpler to reason about and consider. It is a correct behaviour to send the list sorted, so lets try to maintain stricter, and more correct behaviours. In most cases our list will generally be sorted, or near to it anyway. I think that given we are doing a full re-init, we already have a big overhead and hit. We shouldn't be arguing over overheads here, because this is the equivalent to taking a sledgehammer to the remote DS anyway. Additionally, on the consumer recieving the incoming update, we are going to end up doing more work in replication, and it will be much harder to debug. I also think that the simple solution is a good idea, because premature optimisation causes us more issues. Replication is complicated as hell already. Lets not make it more complex. I would like to stick with Noriko's patch to sort the list. Thank you, Ludwig and William, for your comments. The memory overhead would be sizeof(ID) * number of entries in a backend. If it has 10 million entries, it'd be ~38.15MB. I'd think it is acceptable now a days. The sorting cost is a bit reduced since I set parentid index to integerOrderingMatch and if there is no rename history which causes this out-of-order issue, it is already sorted. To implement your proposal, we need to change the tasks done on the consumer side. Currently, the consumer uses the import. Instead, it has to dump the received data into the id2entry.db, then call db2index all (actually, it is internally upgradedb)... In theory, it'd be doable, I think. Only an issue is time for now. :) ok, go ahead with your solution.
On 06/06/2016 02:41 AM, William Brown wrote:
On Mon, 2016-06-06 at 11:07 +0200, Ludwig Krispenz wrote: Sorry if I start the discussion again, Noriko pointed out that my suggestion to do it like db2ldif doesn not work since the callback to send entries doesn't have enough context. But the current proposal seems to be a large overhead wich is not needed in most cases. So let me repeat my poroposal 3] from the ticket: Let the consumer handle this, for a replicted operation we always assume that the operation was valid on the supplier and accept it, so if we receive an entry without parent accept it and create a glue entry, if we later receive the real entry replace the glue entry. I feel like this is a more complex aproach to a simple problem. Noriko's solution works, and is simpler to reason about and consider. It is a correct behaviour to send the list sorted, so lets try to maintain stricter, and more correct behaviours. In most cases our list will generally be sorted, or near to it anyway. I think that given we are doing a full re-init, we already have a big overhead and hit. We shouldn't be arguing over overheads here, because this is the equivalent to taking a sledgehammer to the remote DS anyway. Additionally, on the consumer recieving the incoming update, we are going to end up doing more work in replication, and it will be much harder to debug. I also think that the simple solution is a good idea, because premature optimisation causes us more issues. Replication is complicated as hell already. Lets not make it more complex. I would like to stick with Noriko's patch to sort the list. Thank you, Ludwig and William, for your comments.
On Mon, 2016-06-06 at 11:07 +0200, Ludwig Krispenz wrote:
Sorry if I start the discussion again, Noriko pointed out that my suggestion to do it like db2ldif doesn not work since the callback to send entries doesn't have enough context. But the current proposal seems to be a large overhead wich is not needed in most cases. So let me repeat my poroposal 3] from the ticket: Let the consumer handle this, for a replicted operation we always assume that the operation was valid on the supplier and accept it, so if we receive an entry without parent accept it and create a glue entry, if we later receive the real entry replace the glue entry. I feel like this is a more complex aproach to a simple problem. Noriko's solution works, and is simpler to reason about and consider. It is a correct behaviour to send the list sorted, so lets try to maintain stricter, and more correct behaviours.
Sorry if I start the discussion again, Noriko pointed out that my suggestion to do it like db2ldif doesn not work since the callback to send entries doesn't have enough context.
But the current proposal seems to be a large overhead wich is not needed in most cases.
So let me repeat my poroposal 3] from the ticket:
Let the consumer handle this, for a replicted operation we always assume that the operation was valid on the supplier and accept it, so if we receive an entry without parent accept it and create a glue entry, if we later receive the real entry replace the glue entry. I feel like this is a more complex aproach to a simple problem. Noriko's solution works, and is simpler to reason about and consider. It is a correct behaviour to send the list sorted, so lets try to maintain stricter, and more correct behaviours.
In most cases our list will generally be sorted, or near to it anyway. I think that given we are doing a full re-init, we already have a big overhead and hit. We shouldn't be arguing over overheads here, because this is the equivalent to taking a sledgehammer to the remote DS anyway.
Additionally, on the consumer recieving the incoming update, we are going to end up doing more work in replication, and it will be much harder to debug.
I also think that the simple solution is a good idea, because premature optimisation causes us more issues. Replication is complicated as hell already. Lets not make it more complex.
I would like to stick with Noriko's patch to sort the list.
Thank you, Ludwig and William, for your comments.
The memory overhead would be sizeof(ID) * number of entries in a backend. If it has 10 million entries, it'd be ~38.15MB. I'd think it is acceptable now a days.
The sorting cost is a bit reduced since I set parentid index to integerOrderingMatch and if there is no rename history which causes this out-of-order issue, it is already sorted.
To implement your proposal, we need to change the tasks done on the consumer side. Currently, the consumer uses the import. Instead, it has to dump the received data into the id2entry.db, then call db2index all (actually, it is internally upgradedb)... In theory, it'd be doable, I think. Only an issue is time for now. :) ok, go ahead with your solution.
I tried the idea -- taking care in the consumer side -- but unfortunately I could not make it work with the simple change...
I'm going to push the patch 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.2.patch.
Reviewed by William and Ludwig (Thank you!!)
Pushed to master: 7f8c64f..e218a18 master -> master commit 3606b78 commit e218a18
Pushed to 389-ds-base-1.3.4: 85f0212..80893ff 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 4e70ab5 commit 80893ff
Found a serious regression. Revert the commit once. b08df71..56ad9a1 master -> master Revert "Ticket #48755 - moving an entry could make the online init fail" This reverts commit 3606b78.
80893ff..2c95bee 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 Revert "Ticket #48755 - moving an entry could make the online init fail" This reverts commit 4e70ab5.
git patch file (master) -- additional fix to the server patch 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch
New patch looks okay to me, haven't had chance to run tests yet.
Which is the total set of patches this needs to operate?
{{{ 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.2.patch (37.2 KB) - added by nhosoi 4 weeks ago. git patch file (master) -- eliminated a config param nsds5SortedTotalUpdate 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.2.patch (8.3 KB) - added by nhosoi 4 weeks ago. git patch file (master) -- CI test; adjusted to the .2 patch 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch (8.6 KB) - added by nhosoi 6 days ago. git patch file (master) -- additional fix to the server patch 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.3.patch (4.6 KB) - added by nhosoi 6 days ago. git patch file (master) -- additional test case to the CI test patch }}}
These 4?
Replying to [comment:22 firstyear]:
New patch looks okay to me, haven't had chance to run tests yet. Which is the total set of patches this needs to operate? [snip] These 4?
Which is the total set of patches this needs to operate? [snip] These 4?
Yes. I could merge them when I push. I thought providing delta from the first pair might be easier for the reviews.
https://fedorahosted.org/389/ticket/48755?replyto=22#comment:17
Thanks!
Hi Noriko,
Looking at https://fedorahosted.org/389/attachment/ticket/48755/0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch, I have just one question. is_bulk_import is set at the condition 'entryrdn_get_switch' is set. Would it be possible to, enable moving subtree, then moving an entry to trigger that bug, then disable moving subtree, then to a bulk_import ?
Replying to [comment:24 tbordaz]:
Hi Noriko, Looking at https://fedorahosted.org/389/attachment/ticket/48755/0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch, I have just one question. is_bulk_import is set at the condition 'entryrdn_get_switch' is set. Would it be possible to, enable moving subtree, then moving an entry to trigger that bug, then disable moving subtree, then to a bulk_import ?
Thierry, I think disabling subtree rename is not so easy and straightforward... :)
Here's the subtree rename "on" vs. "off" comparison: {{{ "on" -- id2entry contains RDNs use entryrdn index "off" -- id2entry contains DNs use entrydn index }}} That is, switching from "on" to "off" requires not just reindexing/recreating entrydn, but you have to recreate the primary id2entry db. There is no automated converter to do the task (we have the other direction for the upgrade purpose, though). That means, you have to do export / import. In the export, the entries' order is straightened out. It is guaranteed that the output ldif has no entry which has no parent entry prior to itself in the ldif.
So, I think we don't have to worry about the scenario.
git patch file (master) -- merged .2.patch and .3.patch 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.patch
git patch file (master) -- CI test; additional test cases. 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.patch
Proposed patches: {{{ Server -- 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.patch git patch file (master) -- merged .2.patch and .3.patch CItest -- 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.2.patch git patch file (master) -- CI test; adjusted to the .2 patch 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.patch added git patch file (master) -- CI test; additional test cases. }}}
Looks good to me, ack
It builds, and passes all the tests on my laptop too. I'm happy to ack as well.
Thanks to all of you who gave me your comments. They are all soooo precious!
Pushed to master: 0157eb1..a20111b master -> master commit f50e8ec commit a20111b
Pushed to 389-ds-base-1.3.4: 0fe8f37..3c789a5 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit cac1b0a commit 3c789a5
Fixing a syntax error in the upgrade script template 91reindex.pl.in See also Bug 1353592 - Setup-ds.pl --update fails
Pushed to master: 6b61e05..aa64641 master -> master commit aa64641
Pushed to 389-ds-base-1.3.4: 01f1546..d432113 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit d432113
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.3.4.11
Commit b8a922e relates to this ticket
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1815
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Log in to comment on this ticket.