Issue #49850: ldbm_get_nonleaf_ids() painfully slow for databases with many non-leaf entries - 389-ds-base

389-ds-base

#49850 ldbm_get_nonleaf_ids() painfully slow for databases with many non-leaf entries

Closed: wontfix 4 years ago by mreynolds. Opened 5 years ago by telackey.

Issue Description

This was encountered with a very large (~ 27 million entry) database. The logs from an LDIF import indicated that gathering non-leaf IDs for creating the ancestorid index took an enormous amount of time, over 10hrs:

[06/Jul/2018:21:13:40.793092791 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 0% (ID count 100000)
...
[07/Jul/2018:07:27:52.515252224 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 100% (ID count 13069735)

The root cause is that the parentid index btree ordering is lexical, but the IDList being built up from it is sorted numerically. In the existing code, the IDList is maintained in constantly sorted order by idl_insert().

This imposes enormous sorting and array copying overhead with large databases with lots of non-leaf entries.

The attached patch switches to idl_append_extend() for building up the IDList and then sorts the result only once, using qsort with idl_sort_cmp, after the entire list has been gathered.

The improvement on identical hardware is for the operation to take 10 seconds rather than 10 hours:

[12/Jul/2018:16:52:55.620166570 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 0% (ID count 100000)
...
[12/Jul/2018:16:53:04.885823828 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs: processed 100% (ID count 13069735)
[12/Jul/2018:16:53:04.886990799 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Finished gathering ancestorid non-leaf IDs.
[12/Jul/2018:16:53:04.888042215 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Starting sort of ancestorid non-leaf IDs.
[12/Jul/2018:16:53:05.578625740 +0000] - INFO - ldbm_get_nonleaf_ids - import userRoot: Finished sort of ancestorid non-leaf IDs.

Package Version and Platform

1.3.6.14 on CentOS 7

Steps to reproduce

Create an enormous DB, with many non-leaf entries
Dump it to LDIF
Import it with ldif2db

Actual results

It is very, very slow.

Expected results

It should be quick.

telackey commented 5 years ago

lkrispen commented 5 years ago

your analysis seems valid, I even am not sure if we need the sorting at all.
In the creat index code the id list is processed and for each id parentid is determined and the idlist set, I don't see where it is needed to have the list sorted

Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

5 years ago

telackey commented 5 years ago

I was of a similar view.

Lacking certainty about whether there was some point to the sorting--though it appeared to me there was not--I went ahead and sorted it so that I could guarantee continuity before and after the change.

spichugi commented 5 years ago

Hi @telackey
My name is Simon. I am a part of DS team.
You are very welcome with the contribution! :)

One thing. Could you, please, create a pull-request with the commit? It is part of our process now and it will be easier to test and review there.

You can find the instructions here - http://www.port389.org/docs/389ds/contributing.html#creating-a-patch-for-review

Thanks!

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.8

5 years ago

lkrispen commented 5 years ago

I did my own tests, with master, with the provided patch and with the patch without sorting.
The ldif contained 12million entries with 2.4m inner nodes, it was basically a tree where
each node had 5 children in 10 levels.

With master the preparation for the ancestorid with ldbm_get_nonleaf_ids() took 40min, with the patch <1min and the sorting only took <1sec.

For all three tests the generated ancestorid was identical, so sorting is not needed, but it also does not hurt. A small difference with the patch sorted and unsorted was the size of the ancestorid.db file, with master and patch(sorted) it was 2.468GB, with patch(unsorted) it was 2.486GB - so there is a slight difference in fragmentation caused by the order of insertion.

I suggest to use the provided patch as it is, it speeds up the import considerably and creates and identical db file

telackey commented 5 years ago

Thank you!

mreynolds commented 5 years ago

@telackey - do you want to create PR for this, or do you want me to do it?

mreynolds commented 5 years ago

If you do file the PR please use C-style comments, not java style comments //

telackey commented 5 years ago

If you are happy with the patch as-is (modulo the comment style, which is easy to change), I can open up the PR.

Should it be against master?

telackey commented 5 years ago

I am also happy if you do it. Whichever is easier.

mreynolds commented 5 years ago

yes it would be master branch, and the commit message would like:

Ticket 49850 - <DESCRIPTION OF PROBLEM>

Description:  <your description of the issue and fix>

https://pagure.io/389-ds-base/issue/49850

Reviewed by: lkrispen

Then I will cherry-pick it to 1.3.8

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4 backlog (was: 1.3.8)

4 years ago

Metadata Update from @tbordaz:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1749595

4 years ago

mreynolds commented 4 years ago

https://pagure.io/389-ds-base/pull-request/50630

Sorry this took so long @telackey

mreynolds commented 4 years ago

My testing with a "heavily nested" 1 million entry LDIF shows a 35% performance improvement. Most likely the more nested the LDIF is the better the improvement would be.

I did not see any impact on a flat ldif file with 1 million entries.

mreynolds commented 4 years ago

Commit fc47620 relates to this ticket

mreynolds commented 4 years ago

225f4e1..fc47620 master -> master

3b47e3f..fb4ea1d 389-ds-base-1.4.1 -> 389-ds-base-1.4.1

13340ad..9eba2ce 389-ds-base-1.4.0 -> 389-ds-base-1.4.0

3fbff08..3be2a20 389-ds-base-1.3.10 -> 389-ds-base-1.3.10

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Metadata Update from @lkrispen:
- Issue status updated to: Open (was: Closed)

4 years ago

lkrispen commented 4 years ago

There is a crash if an ldif with only the root entry is imported, there will be no leafs an nodes is NULL and

 qsort((void *)&nodes->b_ids[0], nodes->b_nids, (size_t)sizeof(ID), idl_sort_cmp);

crashes

telackey commented 4 years ago

Just do a quick NULL test before the qsort?

telackey commented 4 years ago

Or possibly take the sort out entirely, as mentioned above.

mreynolds commented 4 years ago

Or possibly take the sort out entirely, as mentioned above.

@lkrispen fixed it in this PR: https://pagure.io/389-ds-base/pull-request/50672

Metadata Update from @mreynolds:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

4 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2909

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

None

Milestone

1.4 backlog

reviewstatus

None

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=1749595

origin

None

Attachments 1

ancestorid.sort.patch

Attached 5 years ago View Comment

Related Pull Requests

#50630 Closed 3 years ago

389-ds-base

Source Code

#49850 ldbm_get_nonleaf_ids() painfully slow for databases with many non-leaf entries Closed: wontfix 4 years ago by mreynolds. Opened 5 years ago by telackey.

Issue Description

Package Version and Platform

Steps to reproduce

Actual results

Expected results

Metadata

Attachments 1

Related Pull Requests

#49850 ldbm_get_nonleaf_ids() painfully slow for databases with many non-leaf entries

Closed: wontfix 4 years ago by mreynolds. Opened 5 years ago by telackey.