#47365 Improve performance in 389DS when performing mass deletes.
Closed: wontfix 3 months ago by spichugi. Opened 7 years ago by herlo.

It seems I'm consistently deleting ~6000 records and it takes around 3-4 hours using ldapdelete or ldapmodify with appropriate ldif entries.

This is using the FreeIPA server, but the version is as follows:

389-ds-base.x86_64 1.2.11.15-14.el6_4

I know that deletes are not optimized already for ldap, but I didn't know if there was some way of improving it from ~3.14seconds per delete to somewhere around ~1second?

Thanks,

herlo


Can you give an example of the ldapdelete or ldapmodify command you are using? How many entries are in your database?

If you use some sort of parallelism in your deletions, does that help? For example, doing something like this:

{{{
ii=10
while [ $ii -gt 0 ] ; do
ldapdelete -f file_containing_$ii_th_10th_of_DNs &
ii=expr $ii - 1
done
wait
}}}

Sure, for ldapdelete, I run:

ldapdelete -x -WD "cn=Directory Manager" -c -f records-to-delete.ldif

Records simply look like this:

records-to-delete.ldif

uid=hema,ou=people,cn=caf,dc=example,dc=org
uid=bob,ou=people,cn=caf,dc=example,dc=org
....

For ldapmodify, I run the following:

ldapmodify -cxD "cn=Directory Manager" -W -f records-to-modify.ldif

Records look like this:

records-to-modify.ldif

dn: uid=tommy,ou=people,cn=caf,dc=example,dc=org
changetype: delete

dn: uid=jerome,ou=people,cn=caf,dc=example,dc=org
changetype: delete

...

There are approximately 6130 records. Most of them are in the ou=people DN, though there are a ~50 records in ou=groups and ou=aliases as well.

I'll try the parallel example above. I was also considering using threaded python or some such as another option.

Performing the parallelized function described above does indeed improve the time needed to delete. I did ~1000 records (10 sets of 100 records) in about 10 minutes, so ~6000 records should be closer to 60minutes. Much better, indeed.

However, there are two major caveats to this approach.

1) CPU load on the ldap server increased significantly, because we're hitting it with 10x the delete requests at essentially the same time.

2) If at any point, one of the transactions is cancelled, it is unclear as to where to start again. This results in possible manual intervention to verify records.

However, as a workaround, this is pretty nice. For now, I'm going to go with it and will wait to hear what kinds of optimizations can be done in the actual 389DS code to improve the situation.

Thanks!

Replying to [comment:3 herlo]:

Performing the parallelized function described above does indeed improve the time needed to delete. I did ~1000 records (10 sets of 100 records) in about 10 minutes, so ~6000 records should be closer to 60minutes. Much better, indeed.

However, there are two major caveats to this approach.

1) CPU load on the ldap server increased significantly, because we're hitting it with 10x the delete requests at essentially the same time.

This should be ok - dirsrv is maxing out its performance - does this cause problems?

2) If at any point, one of the transactions is cancelled, it is unclear as to where to start again. This results in possible manual intervention to verify records.

I don't know if this works with ldapdelete, but with ldapmodify, you can specify -S rejectfile and it will write the DNs that it could not delete:
{{{

Error: No such object (32)

dn: cn=this,cn=entry,cn=does,cn=not,cn=exist
changetype: delete
...
}}}

However, as a workaround, this is pretty nice. For now, I'm going to go with it and will wait to hear what kinds of optimizations can be done in the actual 389DS code to improve the situation.

Thanks!

Do you have replication enabled ? Delete is different as with replication the entry is modified to a tombstone entry and not immediately removed from the database.
Could you check if you have indexes configured, which are not required - the delete has also to update all the indexes.

Do you have the referential integrity plugin enabled, I just found in an other case that it can have a big performance impact.

Metadata Update from @herlo:
- Issue set to the milestone: FUTURE

3 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to None
- Issue close_status updated to: None
- Issue tagged with: Performance

8 months ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.4 (was: FUTURE)

8 months ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/702

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 months ago

Login to comment on this ticket.

Metadata