#8548 Automember rebuild fails (timeout) when using default hostgroup
Opened 3 years ago by eugenek. Modified 3 years ago

Whenever I enable a default hostgroup, I can no longer rebuild membership.

This shows a default hostgroup is configured:

$ ipa automember-default-group-show --type=hostgroup
Default (fallback) Group: cn=hosts_not_in_groups,cn=hostgroups,cn=accounts,dc=ipa,dc=...

The automember rebuild fails:

$ ipa automember-rebuild --type=hostgroup
ipa: ERROR: Automember LDAP task timeout, Task DN: 'cn=6c087222-e574-4d77-9194-f591f3e05430,cn=automember rebuild membership,cn=tasks,cn=config'

After disabling the default hostgroup:

$ ipa automember-default-group-show --type=hostgroup
Default (fallback) Group: No default (fallback) group set

$ ipa automember-rebuild --type=hostgroup
Automember rebuild task finished. Processed (772) entries.

Version/Release/Distribution

krb5-server-1.15.1-37.el7_7.2.x86_64
389-ds-base-1.3.9.1-12.el7_7.x86_64
pki-ca-10.5.16-5.el7_7.noarch
ipa-client-4.6.5-11.el7.centos.3.x86_64
ipa-server-4.6.5-11.el7.centos.3.x86_64

Any suggestions? Thanks!


Is anything logged on the 389-side?

Lots of activity in the access log, but I don't see any errors. Is there anything specific I should look for that might indicate a problem?

Assuming there is nothing in DS error logs. DS access logs should show base searches on 'cn=6c087222-e574-4d77-9194-f591f3e05430,cn=automember rebuild membership,cn=tasks,cn=config' that return 1 entries. Would you confirm, how long is it monitored ?
If reproducible, would you copy/paste the search output of that entry and also a pstack of DS. Does DS eat CPU ?

There is a 60-second hardcoded timeout in the IPA task wait. It doesn't mean that the task failed, just that IPA isn't going to wait for it any longer.

@tbordaz Yes, ns-slapd is pegging the CPU while the automember rebuild is running. Even after I get the timeout error message.

Logs look like this:

[22/Oct/2020:16:40:10.226498268 -0700] conn=1110634 fd=371 slot=371 connection from 10.2.100.151 to 10.2.100.151
[22/Oct/2020:16:40:10.229932326 -0700] conn=1110634 op=0 BIND dn="" method=sasl version=3 mech=GSS-SPNEGO
[22/Oct/2020:16:40:10.241065401 -0700] conn=1110634 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0011405387 dn="uid=eugene,cn=users,cn=accounts,dc=ipa,dc=..."
[22/Oct/2020:16:40:10.244151722 -0700] conn=1110634 op=1 ADD dn="cn=a17dbcb9-b3bb-4fd6-8efe-3be993ff9ab7,cn=automember rebuild membership,cn=tasks,cn=config"
[22/Oct/2020:16:40:10.260315766 -0700] conn=1110634 op=1 RESULT err=0 tag=105 nentries=0 etime=0.0016405833
[22/Oct/2020:16:40:10.261286786 -0700] conn=1110634 op=2 SRCH base="cn=ipaconfig,cn=etc,dc=ipa,dc=..." scope=0 filter="(objectClass=*)" attrs=ALL
[22/Oct/2020:16:40:10.262984025 -0700] conn=1110634 op=2 RESULT err=0 tag=101 nentries=1 etime=0.0001830073
[22/Oct/2020:16:40:10.265344729 -0700] conn=1110634 op=3 SRCH base="cn=a17dbcb9-b3bb-4fd6-8efe-3be993ff9ab7,cn=automember rebuild membership,cn=tasks,cn=config" scope=0 filter="(objectClass=*)" attrs=ALL
[22/Oct/2020:16:40:10.265761632 -0700] conn=1110634 op=3 RESULT err=0 tag=101 nentries=1 etime=0.0000578544
[22/Oct/2020:16:40:11.269547502 -0700] conn=1110634 op=4 SRCH base="cn=a17dbcb9-b3bb-4fd6-8efe-3be993ff9ab7,cn=automember rebuild membership,cn=tasks,cn=config" scope=0 filter="(objectClass=*)" attrs=ALL
[22/Oct/2020:16:40:11.295541076 -0700] conn=1110634 op=4 RESULT err=0 tag=101 nentries=1 etime=0.0026156311
[22/Oct/2020:16:40:12.298896950 -0700] conn=1110634 op=5 SRCH base="cn=a17dbcb9-b3bb-4fd6-8efe-3be993ff9ab7,cn=automember rebuild membership,cn=tasks,cn=config" scope=0 filter="(objectClass=*)" attrs=ALL
[22/Oct/2020:16:40:12.418012961 -0700] conn=1110634 op=5 RESULT err=0 tag=101 nentries=1 etime=0.0119274690
...
[22/Oct/2020:16:41:09.192884733 -0700] conn=1110634 op=56 SRCH base="cn=a17dbcb9-b3bb-4fd6-8efe-3be993ff9ab7,cn=automember rebuild membership,cn=tasks,cn=config" scope=0 filter="(objectClass=*)" attrs=ALL
[22/Oct/2020:16:41:09.283377859 -0700] conn=1110634 op=56 RESULT err=0 tag=101 nentries=1 etime=0.0090641470
[22/Oct/2020:16:41:10.287529223 -0700] conn=1110634 op=57 UNBIND
[22/Oct/2020:16:41:10.287562913 -0700] conn=1110634 op=57 fd=371 closed - U1

@rcritten I saw that in the code but I didn't realize it wasn't stopping the attempt. Thanks!

Still, I'd like to be sure that something isn't misconfigured or otherwise broken...

I agree with @rcritten, nothing indicates that the task fails. This is automember-rebuild command that fails because of a timeout.

It is a surprise that enabling default hostgroup can create such timeout. You may monitor the task entry to see when it completes and if it is significantly different with/without 'default hostgroup'.
If there is a significant gap, I would expect it is related of higher #internal operations. You may enable logging of plugin internal ops (nsslapd-plugin-logging: on and nsslapd-accesslog-level: 260) to get details of automember-rebuild activity.

Login to comment on this ticket.

Metadata