#5802 perf: adding a group with 1000 users/hosts lasts long (up to 12s)
Opened 5 years ago by tbordaz. Modified 2 years ago

freeipa-server-4.3.90-0.fc23.x86_64
slapi-nis-0.55-1.fc23.x86_64
389-ds-base-1.3.4.6-1.fc23.x86_64

Adding a group with 1000 users (no nested members although the group oc contains nestedgroup) lasts a long time.

[12/Apr/2016:11:07:00 +0200] conn=17 op=5040 ADD dn="cn=group39,cn=groups,cn=accounts,<suffix>"
[12/Apr/2016:11:07:12 +0200] conn=17 op=5040 RESULT err=0 tag=105 nentries=0 etime=12

The main responsible of this duration is likely memberof plugin updating members and recording the updates into the retroCL

Thread 30 (Thread 0x7f6fc6ffd700 (LWP 2233)):
#0  0x00007f6feac1bfad in fdatasync () from /lib64/libc.so.6
#1  0x00007f6fe51dc285 in __os_fsync () from /lib64/libdb-5.3.so
#2  0x00007f6fe51c3ecc in __log_flush_int () from /lib64/libdb-5.3.so
#3  0x00007f6fe51c618c in __log_flush () from /lib64/libdb-5.3.so
#4  0x00007f6fe51c62fd in __log_flush_pp () from /lib64/libdb-5.3.so
#5  0x00007f6fe03c0973 in dblayer_txn_commit_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#6  0x00007f6fe03c0b05 in dblayer_txn_commit () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#7  0x00007f6fe03e1bcf in ldbm_back_add () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#8  0x00007f6fed324dfa in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007f6fed325663 in add_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#10 0x00007f6fed326385 in slapi_add_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#11 0x00007f6fdec61b7c in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#12 0x00007f6fed37e969 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#13 0x00007f6fed37ebf5 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#14 0x00007f6fe03fc25b in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#15 0x00007f6fed36c595 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#16 0x00007f6fed36d045 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#17 0x00007f6fdface659 in memberof_add_memberof_attr () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#18 0x00007f6fdfacea75 in memberof_fix_memberof_callback () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#19 0x00007f6fdfacf4be in memberof_modop_one_replace_r () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#20 0x00007f6fdfacf652 in memberof_mod_attr_list_r () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#21 0x00007f6fdfad06c6 in memberof_postop_add () from /usr/lib64/dirsrv/plugins/libmemberof-plugin.so
#22 0x00007f6fed37e969 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#23 0x00007f6fed37ebf5 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#24 0x00007f6fe03e18b0 in ldbm_back_add () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#25 0x00007f6fed324dfa in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
#26 0x00007f6fed326178 in do_add () from /usr/lib64/dirsrv/libslapd.so.0
#27 0x000055baf467b28f in connection_threadmain ()
#28 0x00007f6feb54c5db in _pt_root () from /lib64/libnspr4.so
#29 0x00007f6feaeeb60a in start_thread () from /lib64/libpthread.so.0

'memberof' and retroCL being enabled, it is expected that ADDs (possibly MOD as well) are expensive. Now 12 seconds looks very high for a such small group.

An other contributor to the duration could be slapi-nis (note 0.55 contain the perf fix https://git.fedorahosted.org/cgit/slapi-nis.git/diff/src/back-sch.c?id=594fcb2320033d01cfe2b8121793d431d1017987)

Thread 30 (Thread 0x7f6fc6ffd700 (LWP 2233)):
#0  0x00007f6feaba31e0 in _int_free () from /lib64/libc.so.6
#1  0x00007f6feaba6cac in free () from /lib64/libc.so.6
#2  0x00007f6fed32e979 in slapi_ch_free () from /usr/lib64/dirsrv/libslapd.so.0
#3  0x00007f6fed33e0d4 in slapi_entry_free () from /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007f6fde633f44 in format_check_entry_exists () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#5  0x00007f6fde636ee5 in format_deref_rx.isra () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#6  0x00007f6fde63747f in format_deref_r () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#7  0x00007f6fde63586d in format_expand () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#8  0x00007f6fde636644 in format_format () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#9  0x00007f6fde6380a3 in format_get_data_set () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#10 0x00007f6fde62b208 in backend_set_entry_from () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#11 0x00007f6fde62f2db in backend_shr_add_entry_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#12 0x00007f6fde63d0af in map_data_foreach_map () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#13 0x00007f6fde62fa36 in backend_shr_add_cb.part () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#14 0x00007f6fde6320b1 in backend_shr_betxn_post_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#15 0x00007f6fed37e969 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#16 0x00007f6fed37ebf5 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#17 0x00007f6fe03e18b0 in ldbm_back_add () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#18 0x00007f6fed324dfa in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
#19 0x00007f6fed326178 in do_add () from /usr/lib64/dirsrv/libslapd.so.0
#20 0x000055baf467b28f in connection_threadmain ()
#21 0x00007f6feb54c5db in _pt_root () from /lib64/libnspr4.so
#22 0x00007f6feaeeb60a in start_thread () from /lib64/libpthread.so.0
#23 0x00007f6feac25a4d in clone () from /lib64/libc.so.6

I think the problem is that for a group of 1000 member possibly 1000 entries have to be updated, so this is not oen ADD but one ADD an 1000 MODs - and then incombination with retrocl and slapi-nis this could take some time.

It could be an option to move the update of teh memberof attribute out of the ADD operation. In referential integrity plugin there is a config to delay the referential update and if it is > 0 a separat thread will do this and the original operation will not b eblocked.

Other results:

The results given in ticket description were obtained being 'cn=directory manager' so skipping the impact of ACI. For example being DM

[12/Apr/2016:14:12:47 +0200] conn=30 op=1 ADD dn="cn=group100,cn=groups,cn=accounts,<suffix>"
[12/Apr/2016:14:13:04 +0200] conn=30 op=1 RESULT err=0 tag=105 nentries=0 etime=17

While being admin gives an additional delay

[12/Apr/2016:14:14:18 +0200] conn=31 op=4 ADD dn="cn=group101,cn=groups,cn=accounts,<suffix>"
[12/Apr/2016:14:14:39 +0200] conn=31 op=4 RESULT err=0 tag=105 nentries=0 etime=21

Disabling RetroCL and Content Synch does not help

[12/Apr/2016:14:52:52 +0200] conn=17 op=1 ADD dn="cn=group102,cn=groups,cn=accounts,<suffix>"
[12/Apr/2016:14:53:10 +0200] conn=17 op=1 RESULT err=0 tag=105 nentries=0 etime=18

BUT Disabling memberof gives a much better response time (9 to 10times faster)

[12/Apr/2016:15:30:01 +0200] conn=16 op=1 ADD dn="cn=group104,cn=groups,cn=accounts,<suffix>"
[12/Apr/2016:15:30:03 +0200] conn=16 op=1 RESULT err=0 tag=105 nentries=0 etime=2

Like Ludwig suggested (https://fedorahosted.org/freeipa/ticket/5802#comment:2) a separated could process the updates of the members entries. (So the cost would only be the ADD).

An option would be to disable memberof during administrative tasks, then enables it and trigger a memberof task to rebuild the membership relation.

The original use case was to add a group of 1000 users. Disabling memberof improved the update by 9-10 times.

This use case is almost identical to add of a hostgroup of 300 hosts. Disabling memberof improves similarly the update by 8.

Both use cases come from a test that ADD of 5000 users/1000 hosts/100 groups/100 hostgroups.

With memberof

[12/Apr/2016:16:23:46 +0200] conn=17 op=1 ADD dn="cn=hostgroup100,cn=hostgroups,cn=accounts,<suffix"
[12/Apr/2016:16:23:54 +0200] conn=17 op=1 RESULT err=0 tag=105 nentries=0 etime=8

Without memberof

[12/Apr/2016:16:48:54 +0200] conn=18 op=1 ADD dn="cn=hostgroup102,cn=hostgroups,cn=accounts,<suffix>"
[12/Apr/2016:16:48:55 +0200] conn=18 op=1 RESULT err=0 tag=105 nentries=0 etime=1

In conclusion:

When adding 5000 users/1000 hosts/100 groups/100 hostgroups, disabling memberof greatly (*10) reduce the response time. A next step is to measure the complete load: 1- with memberof, 2 -without memberof + spawn memberof task.

Background task sounds good to me, but I can imagine the use case where some provisioning system adds hosts to group at once and then do an action depending on membership of that hosts. However add command return results after one second but hosts in fact are not there yet.

How important is this use case for us, should we bother with this?

Provisioning (with nested groups)
- 5000 users
- 4000 hosts
- 100 groups
- 100 hostgroups
- 100 sudorules
- 100 hbacrules

With memberof plugin enable: it lasts more than 3hours

With memberof plugin disable: it lasts 6min
then running memberof fixup tasks it lasts an additional ~7min:

objectClass=inetorgperson   3min51 (20160412165045Z -> 20160412165336Z)
objectClass=ipausergroup    0min01 (20160412165553Z -> 20160412165554Z)
objectClass=ipahost         2min56 (20160412165808Z -> 20160412170112Z)
objectClass=ipahostgroup    0min02 (20160412170422Z -> 20160412170424Z)
objectClass=ipasudorule     0min00 (20160412170608Z -> 20160412170608Z)
objectClass=ipahbacrule     0min00 (20160412170805Z -> 20160412170805Z)
                            ------
                            6min50

If the administrator can identify an admin period when he can provision the entries and recreate the membership links the total operation last: ~13min

Without the admin period, "normal provisioning" lasts: more than 3hours (TBC)

After some more tests, "normal provisioning" (with memberof enabled) last: 4h14min (Tue Apr 12 19:27:12 CEST 2016 --> Tue Apr 12 23:41:29 CEST 2016)

"two steps provisioning" (disabling memberof then memberof fixup) last 13 min

That means "two steps provisioning" is 20times faster than "normal provisioning". The next steps are to confirm that the final results are identical and if so explain why this difference.

The final result is identical whether the provision is done in:

  • one step (add_with_memberof)
  • two step (add_without_memberof + memberof fixup)

Next step is to know what is the difference (#op, response time..) of the two types of provisioning knowing that the final result is identical.

triage notes: we should be more interested on other member of use cases

Performance improvements of member of plugin are out of scope of 4.4 release. Moving to 4.5 backlog.

Metadata Update from @tbordaz:
- Issue assigned to someone
- Issue set to the milestone: FreeIPA 4.5 backlog

4 years ago

Metadata Update from @rcritten:
- Issue close_status updated to: None
- Issue tagged with: performance

2 years ago

Login to comment on this ticket.

Metadata