#49850 ldbm_get_nonleaf_ids() painfully slow for databases with many non-leaf entries
Closed: wontfix 4 years ago by mreynolds. Opened 5 years ago by telackey.

Issue Description

This was encountered with a very large (~ 27 million entry) database. The logs from an LDIF import indicated that gathering non-leaf IDs for creating the ancestorid index took an enormous amount of time, over 10hrs:

[06/Jul/2018:21:13:40.793092791 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 0% (ID count 100000)
...
[07/Jul/2018:07:27:52.515252224 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 100% (ID count 13069735)

The root cause is that the parentid index btree ordering is lexical, but the IDList being built up from it is sorted numerically. In the existing code, the IDList is maintained in constantly sorted order by idl_insert().

This imposes enormous sorting and array copying overhead with large databases with lots of non-leaf entries.

The attached patch switches to idl_append_extend() for building up the IDList and then sorts the result only once, using qsort with idl_sort_cmp, after the entire list has been gathered.

The improvement on identical hardware is for the operation to take 10 seconds rather than 10 hours:

[12/Jul/2018:16:52:55.620166570 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 0% (ID count 100000)
...
[12/Jul/2018:16:53:04.885823828 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 100% (ID count 13069735)
[12/Jul/2018:16:53:04.886990799 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Finished gathering ancestorid non-leaf IDs.
[12/Jul/2018:16:53:04.888042215 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Starting sort of ancestorid non-leaf IDs.
[12/Jul/2018:16:53:05.578625740 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Finished sort of ancestorid non-leaf IDs.

Package Version and Platform

1.3.6.14 on CentOS 7

Steps to reproduce

  1. Create an enormous DB, with many non-leaf entries
  2. Dump it to LDIF
  3. Import it with ldif2db

Actual results

It is very, very slow.

Expected results

It should be quick.


your analysis seems valid, I even am not sure if we need the sorting at all.
In the creat index code the id list is processed and for each id parentid is determined and the idlist set, I don't see where it is needed to have the list sorted

Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

5 years ago

I was of a similar view.

Lacking certainty about whether there was some point to the sorting--though it appeared to me there was not--I went ahead and sorted it so that I could guarantee continuity before and after the change.

Hi @telackey
My name is Simon. I am a part of DS team.
You are very welcome with the contribution! :)

One thing. Could you, please, create a pull-request with the commit? It is part of our process now and it will be easier to test and review there.

You can find the instructions here - http://www.port389.org/docs/389ds/contributing.html#creating-a-patch-for-review

Thanks!

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.8

5 years ago

I did my own tests, with master, with the provided patch and with the patch without sorting.
The ldif contained 12million entries with 2.4m inner nodes, it was basically a tree where
each node had 5 children in 10 levels.

With master the preparation for the ancestorid with ldbm_get_nonleaf_ids() took 40min, with the patch <1min and the sorting only took <1sec.

For all three tests the generated ancestorid was identical, so sorting is not needed, but it also does not hurt. A small difference with the patch sorted and unsorted was the size of the ancestorid.db file, with master and patch(sorted) it was 2.468GB, with patch(unsorted) it was 2.486GB - so there is a slight difference in fragmentation caused by the order of insertion.

I suggest to use the provided patch as it is, it speeds up the import considerably and creates and identical db file

@telackey - do you want to create PR for this, or do you want me to do it?

If you do file the PR please use C-style comments, not java style comments //

If you are happy with the patch as-is (modulo the comment style, which is easy to change), I can open up the PR.

Should it be against master?

I am also happy if you do it. Whichever is easier.

yes it would be master branch, and the commit message would like:

Ticket 49850 - <DESCRIPTION OF PROBLEM>

Description:  <your description of the issue and fix>

https://pagure.io/389-ds-base/issue/49850

Reviewed by: lkrispen

Then I will cherry-pick it to 1.3.8

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4 backlog (was: 1.3.8)

4 years ago

Metadata Update from @tbordaz:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1749595

4 years ago

My testing with a "heavily nested" 1 million entry LDIF shows a 35% performance improvement. Most likely the more nested the LDIF is the better the improvement would be.

I did not see any impact on a flat ldif file with 1 million entries.

Commit fc47620 relates to this ticket

225f4e1..fc47620 master -> master

3b47e3f..fb4ea1d 389-ds-base-1.4.1 -> 389-ds-base-1.4.1

13340ad..9eba2ce 389-ds-base-1.4.0 -> 389-ds-base-1.4.0

3fbff08..3be2a20 389-ds-base-1.3.10 -> 389-ds-base-1.3.10

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Metadata Update from @lkrispen:
- Issue status updated to: Open (was: Closed)

4 years ago

There is a crash if an ldif with only the root entry is imported, there will be no leafs an nodes is NULL and

 qsort((void *)&nodes->b_ids[0], nodes->b_nids, (size_t)sizeof(ID), idl_sort_cmp);

crashes

Just do a quick NULL test before the qsort?

Or possibly take the sort out entirely, as mentioned above.

Or possibly take the sort out entirely, as mentioned above.

@lkrispen fixed it in this PR: https://pagure.io/389-ds-base/pull-request/50672

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2909

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata
Attachments 1
Attached 5 years ago View Comment
Related Pull Requests