#49796 Online replica initialization broken with large, deep databases
Closed: wontfix 3 years ago by spichugi. Opened 6 years ago by telackey.

Issue Description

Online replica initialization no longer works if you have a very large, deep database.

This appears to be related to a changes made to fix this bug: https://pagure.io/389-ds-base/issue/48755

As a result, a new scheme of gathering the entries for a total update was introduced, keyed off the filter parentid >= 1 and a new operator, SLAPI_OP_RANGE_NO_IDL_SORT.

Despite the name, it seems more like a "tree depth order" sort, ensuring that all parent IDs are ahead of their children in the ID list. With that "tree depth" ordering, as the total update sends entries to the consumer, it can be confident that the parent has been sent and exists before any child entries are sent.

The sorting is done by adding the current ID to the array only if its parent ID is already present in the array (cf. idl_id_is_in_idlist()), else it is added to a special leftovers array, which is processed at the end.

The function that does the check, idl_id_is_in_idlist(), sequentially scans the ID list, looking for the parent ID. It is scanned for every entry.

If the data is arranged in a shallow tree, this is probably OK, as the parent entry will always be present early on in the array. However, if you have a tree that is deep or leafy, or even where the entries are split (like a forked tree, with half the entries under ou=R and half under ou=B) then idl_id_is_in_idlist() may need to scan millions of IDs, and it will scan them over, and over, and over...

At a certain point, replication can become impossible, because the connection that was opened when the total update was initiated is closed before the IDList is finished sorting and any entries are sent.

That is not only theoretical, but observed with a multi-GB, deep and leafy database after it had spent about 19 hours (!) gathering and sorting the IDList in idl_new_range_fetch() before being ready to send its first entry.

Package Version and Platform

Tested on CentOS 7 with 1.3.6.14.

Steps to reproduce

  1. Fill a database with a large, deep tree.
  2. Attempt to initialize a consumer online.

Actual results

The total update operation is begun, but a very long time is spent gathering the IDList before any entries are sent to the consumer. One indication of this is an import status message on the consumer indicating 0 or 1 entries, and other bogus figures:

[19/Jun/2018:11:50:55.340613439 -0400] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 214748364.8/sec, recent rate 0.0/sec, hit ratio 0%
[19/Jun/2018:11:51:16.487648096 -0400] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 104755299.9/sec, recent rate 0.0/sec, hit ratio 0%
[19/Jun/2018:11:51:36.779065427 -0400] - INFO - import_monitor_threads - import userRoot: Processed 1 entries (pass 2) -- average rate 70409299.9/sec, recent rate 0.0/sec, hit ratio 0%

If the database is big enough, it may even take long enough for the connection to close.

Other indications are using 100% of a core on the supplier after initiating a total update, and of excessive time spent in idl_id_is_in_idlist().

Expected results

Initialization should begin sending entries almost immediately.


If you do an offline init (ldif2db) is it faster?

For example:

Master: db2ldif -r -n userroot -a /tmp/ldif
Replica: ldif2db -n userroot -i /tmp/ldif

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

@mreynolds

Yes, in fact, it doesn't really hit this problem all, because in this server, the parentid is always less than the child (ie, no moddn). The db2ldif code uses a separate scheme for ensuring the order of the tree (cf. _export_or_index_parents() in ldap/servers/slapd/back-ldbm/ldif2ldbm.c) which seems better suited to the problem.

I added a counter for the number IDs scanned by idl_id_is_in_idlist() using db2ldif and using online initialization with my small (compared to the real system) DB of 650K entries. On db2ldif, the answer was zero, since my tree was already ordered. In the online case, the loop in idl_id_is_in_idlist() was started 855,526 times and scanned 27,842,335,568 IDs in the process. That small database "only" took 11 minutes to scan the IDList, not 19 hours like the real DB, but it was still long enough to cause the total update to fail when it finally tried to send the first entry:

[19/Jun/2018:17:01:04.699080440 -0500] - DEBUG - NSMMReplicationPlugin - close_connection_internal - agmt="cn=389ds-build_to_test76" (test76:389) - Disconnected from the consumer
[19/Jun/2018:17:01:05.326634281 -0500] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=389ds-build_to_test76" (test76:389): Unable to send endReplication extended operation (Timed out)
[19/Jun/2018:17:01:05.327526742 -0500] - ERR - NSMMReplicationPlugin - repl5_tot_run - Total update failed for replica "agmt="cn=389ds-build_to_test76" (test76:389)", error (-1)

For comparison sake, running db2ldif only took 3 minutes to export all 653,112 entries.

We need to do something here. In the discussion in ticket #48755 I was already concerned about the sorting overhead and made two other proposals, maybe we should investigate again

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.0

6 years ago

Metadata Update from @mreynolds:
- Issue priority set to: normal
- Issue set to the milestone: 1.4.4 (was: 1.4.0)

4 years ago

Metadata Update from @mreynolds:
- Issue priority set to: major (was: normal)
- Issue set to the milestone: 1.4.3 (was: 1.4.4)

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2855

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

Log in to comment on this ticket.

Metadata