#48755 moving an entry could make the online init fail
Closed: wontfix None Opened 8 years ago by nhosoi.

Steps to reproduce.

1) set up MMR
2) DIT:
   dc=example,dc=com
   uid=user,dc=example,dc=com
3) Add an OU
   ou=OU,dc=example,dc=com
4) Move the user on master1
   ldapmodify ... << EOF
   changetype: modrdn
   newrdn: uid=user
   deleteoldrdn: 0
   newsuperior: ou=OU,dc=example,dc=com
   EOF
5) Do online init on master1
   master2/error log shows:
   [..] - import userRoot: WARNING: Skipping entry "uid=user,ou=OU,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)"
   [..] - import userRoot: WARNING: bad entry: ID 25

This is because the parent (ou=OU) is located after the child (uid=user).

Export (db2ldif) can handle the case correctly, but online init is not.

Workaroud:

1) shutdown the servers
2) run db2ldif on master1
   db2ldif -n <backend> -r
3) import it to master2
   ldif2db -n <backend> -i /path/to/the/ldiffile

Why would we make this configurable? If this is a bug during total init, wouldn't we wanted the sorted init to always be run? I think that there is no need for the configuration because leaving the current behaviour on can break things ....

Additionally, wouldn't this fix actually remove the need for the silly CSN behaviour where we have to after a total init set the consumer CSN low, and replay all the changelog even though most events have been applied? I think that previously we would have to set the CSN low to replay modrdn's like this, but with the sorted total init that removes that behaviour .....

Replying to [comment:6 firstyear]:

Why would we make this configurable? If this is a bug during total init, wouldn't we wanted the sorted init to always be run? I think that there is no need for the configuration because leaving the current behaviour on can break things ....

That is because the current implementation does not need to generate a IDlist. If your database is huge and all the IDs are in the ascendant order, it'd be an overhead for no benefit.

Additionally, wouldn't this fix actually remove the need for the silly CSN behaviour where we have to after a total init set the consumer CSN low, and replay all the changelog even though most events have been applied? I think that previously we would have to set the CSN low to replay modrdn's like this, but with the sorted total init that removes that behaviour .....

I don't think so. This sorted "initialization" just guarantees all the entries are sent to a consumer and nothing else...

Replying to [comment:7 nhosoi]:

Replying to [comment:6 firstyear]:

Why would we make this configurable? If this is a bug during total init, wouldn't we wanted the sorted init to always be run? I think that there is no need for the configuration because leaving the current behaviour on can break things ....

That is because the current implementation does not need to generate a IDlist. If your database is huge and all the IDs are in the ascendant order, it'd be an overhead for no benefit.

This is during an online init though. So already we have a huge impact. I would rather see us value correctness of what we send over performance and edge cases.

Additionally, wouldn't this fix actually remove the need for the silly CSN behaviour where we have to after a total init set the consumer CSN low, and replay all the changelog even though most events have been applied? I think that previously we would have to set the CSN low to replay modrdn's like this, but with the sorted total init that removes that behaviour .....

I don't think so. This sorted "initialization" just guarantees all the entries are sent to a consumer and nothing else...

Well at the moment if we have the scenario where you have modrdn on a entry that would be skipped.

You send the complete init from master A to consumer B. The entry that was modrdn would have to be skipped .... which means it's not on consumer B.

Then the CSN of master A -> consumer B is kept low. So then the next replication session, the changelog is replayed. Most changes are skipped (which is wasteful in itself!) but then the entry that was moved is now added to consumer B.

Basically, what this means is that currently a total init, is not actually total, but more "partial to a point". This could be really bad if the modrdn operation had been trimmed out of the changelog also and could represent a loss of data on consumer B.

Where as in your new change, in this scenario, because of the ordering, the modrdn doesn't matter. After the total init the contents of consumer B are "correct" as per master A. This means we could set the CSN of consumer B to be "high" (ie matching master A). This would minimise, or eliminate the need to replay the log to the consumer to basically fix up what was missed.

However I may be missing something important in my mind about how this works. I would like to see Mark or Ludwig comment on this.

But if this works the way I think it does, if we made this the default, we could eliminate (and simplify) many parts of replication because we would be eliminating some of these edge cases!

I agree with William that we should do it always right, no requirement to configure.
I agree with Noriko that the current proposed fix looks too costly to do it always.

Why don't we do it like ldbm2ldif does it, which I think has very low overhead:

  • Iterate thru the id2entry file, without need of an idlist
  • for each entry check if parentid < entryid
  • if parentid < entryid the parent was already sent, send the entry
  • if not, there are two options:
    1] - what ldbm2ldif does: get the parent entry and send it, make a not that this entry was sent, continue
    2] - my spontaneous idea: stack the entry, continue, later process the stack (but this may be more complicated since a parent now could be in the stack)

So, my suggestion: do it like 1], it works already for db2ldif, it doesn't seem to have much overhead

There is also a third option:
3] handle the issue on the consumer side. If an entry is received without parent don't return an error, create a glue entry and replace when parent is received.

But I think 1] should be the least overhead

Replying to [comment:10 lkrispen]:

I agree with William that we should do it always right, no requirement to configure.
I agree with Noriko that the current proposed fix looks too costly to do it always.

Why don't we do it like ldbm2ldif does it, which I think has very low overhead:

The implementation of the online init is cleverly done using the callback send_entry via slapi_search_internal_callback_pb. I did not want to change the framework...

  • Iterate thru the id2entry file, without need of an idlist
  • for each entry check if parentid < entryid
  • if parentid < entryid the parent was already sent, send the entry
  • if not, there are two options:
    1] - what ldbm2ldif does: get the parent entry and send it, make a not that this entry was sent, continue
    2] - my spontaneous idea: stack the entry, continue, later process the stack (but this may be more complicated since a parent now could be in the stack)
    3] handle the issue on the consumer side. If an entry is received without parent don't return an error, create a glue entry and replace when parent is received.

So, my suggestion: do it like 1], it works already for db2ldif, it doesn't seem to have much overhead

You suggest to expose the ldbm2ldif code to the plugin? It may not be so clean... :)

I think that even if there is an overhead to sorting, it seems like the simpler solution than having a second stack that we need to post process out. The risk with the stack is what happens if we have un-ordered entries there too? Don't we need to iterate over it multiple times then? How do we track this?

I think pre-sorting is the simple and correct solution. Online init is already "costly". Paying extra is no matter IMO.

From Noriko's email:

{{{
Now, I'm wondering... the extra IDList is (4 byte ID * entry counts)
per DB instance. If you have 5 million entries, it'd add the IDlist
which size is about 20MB. Thinking of the typical physical memory size
these days, it may not be that bad at all... (If you have plenty of
entries, you'd have large memory, any way...) That said, I could just
remove the config parameter nsds5SortedTotalUpdate and run it in the
sorted manner if entryrdn_get_switch() returns true as follows. Do you
think it's acceptable?

diff --git a/ldap/servers/plugins/replication/repl5_tot_protocol.c
b/ldap/servers/plugins/replication/repl5_tot_protocol.c

index 9218a5a..3a0c71a 100644
--- a/ldap/servers/plugins/replication/repl5_tot_protocol.c
+++ b/ldap/servers/plugins/replication/repl5_tot_protocol.c
@@ -418,7 +418,7 @@ retry:
      pb = slapi_pblock_new ();

      replica = (Replica*) object_get_data(prp->replica_object);
-    if (replica && replica_get_sorted_total_update(replica)) {
+    if (entryrdn_get_switch()) {
          /*
           * Supporting entries out of order -- parent could have a
larger id than its children.
           * Entires are retireved sorted by parentid without the
allid threshold.

Also, I'm going to run the upgrade script to reindex parentid only if it
is sorted with the alphabetical order. I'm adding the check now.
}}}

If the over head is 20MB, that is nothing on a modern system. Even sorting 20MB of data on modern cpu's is fast.

With this part:

{{{
+ if (entryrdn_get_switch()) {
}}}

This is defined by nsslapd-subtree-rename-switch. I do not like depending on this.

Consider I have a database where I:

  • nsslapd-subtree-rename-switch: on
  • move a bunch of entries
  • nsslapd-subtree-rename-switch: off
  • online init -> some consumer.

Now my list needs sorting, but it won't be because I toggled that switch. It seems contrived, but this scenario would happen over the course of say months, if not years. Someone changes the subtree-rename, then a few months later can't solve why online init failed.

I think that we shouldn't depend on this config either sorry.

Replying to [comment:13 firstyear]:

Consider I have a database where I:

  • nsslapd-subtree-rename-switch: on
  • move a bunch of entries
  • nsslapd-subtree-rename-switch: off
  • online init -> some consumer.

Agh... I wish it is that simple... :)

You cannot do this. I mean if you want to, you need the extra steps in between.

  • nsslapd-subtree-rename-switch: on
  • move a bunch of entries
  • nsslapd-subtree-rename-switch: off

At this moment, the DB is already broken for the server... You have to do db2ldif, then ldif2db, which straights out the ancestor - descendant relationship.

Then, you can do this.

  • online init -> some consumer.

git patch file (master) -- eliminated a config param nsds5SortedTotalUpdate
0001-Ticket-48755-moving-an-entry-could-make-the-online-i.2.patch

git patch file (master) -- CI test; adjusted to the .2 patch
0002-Ticket-48755-CI-test-test-case-for-ticket-48755.2.patch

  • nsslapd-subtree-rename-switch: on
  • move a bunch of entries
  • nsslapd-subtree-rename-switch: off

At this moment, the DB is already broken for the server... You have to do db2ldif, then ldif2db, which straights out the ancestor - descendant relationship.

Right, that makes sense.

I've built and tested this code. At a read, it looks good. You have my ack. Do you want to get Ludwig to check also?

Copy & paste the email discussion for the record.
On 06/07/2016 02:00 AM, Ludwig Krispenz wrote:

On 06/06/2016 06:34 PM, Noriko Hosoi wrote:

On 06/06/2016 02:41 AM, William Brown wrote:

On Mon, 2016-06-06 at 11:07 +0200, Ludwig Krispenz wrote:

Sorry if I start the discussion again,
Noriko pointed out that my suggestion to do it like db2ldif doesn not
work since the callback to send entries doesn't have enough context.

But the current proposal seems to be a large overhead wich is not needed
in most cases.

So let me repeat my poroposal 3] from the ticket:

Let the consumer handle this,
for a replicted operation we always assume that the operation was valid
on the supplier and accept it, so if we receive an entry without parent
accept it and create a glue entry, if we later receive the real entry
replace the glue entry.
I feel like this is a more complex aproach to a simple problem. Noriko's solution works, and is simpler to reason about and
consider. It is a correct behaviour to send the list sorted, so lets try to maintain stricter, and more correct behaviours.

In most cases our list will generally be sorted, or near to it anyway. I think that given we are doing a full re-init, we
already have a big overhead and hit. We shouldn't be arguing over overheads here, because this is the equivalent to taking
a sledgehammer to the remote DS anyway.

Additionally, on the consumer recieving the incoming update, we are going to end up doing more work in replication, and it
will be much harder to debug.

I also think that the simple solution is a good idea, because premature optimisation causes us more issues. Replication is
complicated as hell already. Lets not make it more complex.

I would like to stick with Noriko's patch to sort the list.

Thank you, Ludwig and William, for your comments.

The memory overhead would be sizeof(ID) * number of entries in a backend. If it has 10 million entries, it'd be ~38.15MB. I'd think it is acceptable now a days.

The sorting cost is a bit reduced since I set parentid index to integerOrderingMatch and if there is no rename history which causes this out-of-order issue, it is already sorted.

To implement your proposal, we need to change the tasks done on the consumer side. Currently, the consumer uses the import. Instead, it has to dump the received data into the id2entry.db, then call db2index all (actually, it is internally upgradedb)... In theory, it'd be doable, I think. Only an issue is time for now. :)
ok, go ahead with your solution.

I tried the idea -- taking care in the consumer side -- but unfortunately I could not make it work with the simple change...

I'm going to push the patch 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.2.patch​.

Reviewed by William and Ludwig (Thank you!!)

Pushed to master:
7f8c64f..e218a18 master -> master
commit 3606b78
commit e218a18

Pushed to 389-ds-base-1.3.4:
85f0212..80893ff 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 4e70ab5
commit 80893ff

Found a serious regression. Revert the commit once.
b08df71..56ad9a1 master -> master
Revert "Ticket #48755 - moving an entry could make the online init fail"
This reverts commit 3606b78.

80893ff..2c95bee 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
Revert "Ticket #48755 - moving an entry could make the online init fail"
This reverts commit 4e70ab5.

New patch looks okay to me, haven't had chance to run tests yet.

Which is the total set of patches this needs to operate?

{{{
0001-Ticket-48755-moving-an-entry-could-make-the-online-i.2.patch​ (37.2 KB) - added by nhosoi 4 weeks ago.
git patch file (master) -- eliminated a config param nsds5SortedTotalUpdate
0002-Ticket-48755-CI-test-test-case-for-ticket-48755.2.patch​ (8.3 KB) - added by nhosoi 4 weeks ago.
git patch file (master) -- CI test; adjusted to the .2 patch
0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch​ (8.6 KB) - added by nhosoi 6 days ago.
git patch file (master) -- additional fix to the server patch
0002-Ticket-48755-CI-test-test-case-for-ticket-48755.3.patch​ (4.6 KB) - added by nhosoi 6 days ago.
git patch file (master) -- additional test case to the CI test patch
}}}

These 4?

Replying to [comment:22 firstyear]:

New patch looks okay to me, haven't had chance to run tests yet.

Which is the total set of patches this needs to operate?
[snip]
These 4?

Yes. I could merge them when I push. I thought providing delta from the first pair might be easier for the reviews.

https://fedorahosted.org/389/ticket/48755?replyto=22#comment:17

Thanks!

Hi Noriko,

Looking at https://fedorahosted.org/389/attachment/ticket/48755/0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch, I have just one question.
is_bulk_import is set at the condition 'entryrdn_get_switch' is set.
Would it be possible to, enable moving subtree, then moving an entry to trigger that bug, then disable moving subtree, then to a bulk_import ?

Replying to [comment:24 tbordaz]:

Hi Noriko,

Looking at https://fedorahosted.org/389/attachment/ticket/48755/0001-Ticket-48755-moving-an-entry-could-make-the-online-i.3.patch, I have just one question.
is_bulk_import is set at the condition 'entryrdn_get_switch' is set.
Would it be possible to, enable moving subtree, then moving an entry to trigger that bug, then disable moving subtree, then to a bulk_import ?

Thierry, I think disabling subtree rename is not so easy and straightforward... :)

Here's the subtree rename "on" vs. "off" comparison:
{{{
"on" -- id2entry contains RDNs
use entryrdn index
"off" -- id2entry contains DNs
use entrydn index
}}}
That is, switching from "on" to "off" requires not just reindexing/recreating entrydn, but you have to recreate the primary id2entry db. There is no automated converter to do the task (we have the other direction for the upgrade purpose, though). That means, you have to do export / import. In the export, the entries' order is straightened out. It is guaranteed that the output ldif has no entry which has no parent entry prior to itself in the ldif.

So, I think we don't have to worry about the scenario.

Proposed patches:
{{{
Server -- 0001-Ticket-48755-moving-an-entry-could-make-the-online-i.patch​
git patch file (master) -- merged .2.patch​ and .3.patch
CItest -- 0002-Ticket-48755-CI-test-test-case-for-ticket-48755.2.patch​
git patch file (master) -- CI test; adjusted to the .2 patch
0002-Ticket-48755-CI-test-test-case-for-ticket-48755.patch​ added
git patch file (master) -- CI test; additional test cases.
}}}

It builds, and passes all the tests on my laptop too. I'm happy to ack as well.

Thanks to all of you who gave me your comments. They are all soooo precious!

Pushed to master:
0157eb1..a20111b master -> master
commit f50e8ec
commit a20111b

Pushed to 389-ds-base-1.3.4:
0fe8f37..3c789a5 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit cac1b0a
commit 3c789a5

Fixing a syntax error in the upgrade script template 91reindex.pl.in
See also Bug 1353592 - Setup-ds.pl --update fails

Pushed to master:
6b61e05..aa64641 master -> master
commit aa64641

Pushed to 389-ds-base-1.3.4:
01f1546..d432113 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit d432113

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.3.4.11

7 years ago

Commit b8a922e relates to this ticket

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1815

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

4 years ago

Log in to comment on this ticket.

Metadata
Related Pull Requests