#7839 One of masters shows incorrect Active Users count
Opened 4 months ago by greyer. Modified 2 months ago

Issue

I have 2 masters and multiple replicas cluster. On the first master Active Users count is broken even though the count of users is proper.

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates'
dn: cn=users,cn=accounts,dc=example,dc=com
numSubordinates: 157

[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates'
dn: cn=users,cn=accounts,dc=example,dc=com
numSubordinates: 156

I have checked multiple times and there are no unresolved replication conflicts.

[sebastian@ds1 ~]$ ldapsearch -D "cn=Directory Manager" -W "(&(objectClass=ldapSubEntry)(nsds5ReplConflict=*))" \* nsds5ReplConflict
# extended LDIF
#
# LDAPv3
# base <dc=drawbrid,dc=ge> (default) with scope subtree
# filter: (&(objectClass=ldapSubEntry)(nsds5ReplConflict=*))
# requesting: * nsds5ReplConflict
#

# search result
search: 2
result: 0 Success

# numResponses: 1

[sebastian@ds1 ~]$ ldapsearch -x -b "cn=mapping tree,cn=config" -D "cn=Directory Manager" -W objectClass=nsDS5ReplicationAgreement -LL | grep "nsds5replicaLastUpdateStatus"
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme

What is odd, checking the number of entries show correct number

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com dn | grep -c uid
156

[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com dn | grep -c uid
156

Steps to Reproduce

Can't reproduce it.

Actual behavior

On one master I have count of numSubordinates for users higher by one then on other master or replicas.

Expected behavior

numSubordinates should be the same on all hosts.

Version/Release/Distribution

All systems are running CentOS 7.

[sebastian@ds1 ~]$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
package freeipa-server is not installed
package freeipa-client is not installed
ipa-server-4.5.4-10.el7.centos.4.4.x86_64
ipa-client-4.5.4-10.el7.centos.4.4.x86_64
389-ds-base-1.3.6.1-24.el7_4.x86_64
pki-ca-10.5.1-15.el7_5.noarch
krb5-server-1.15.1-8.el7.x86_64

[sebastian@ds2 ~]$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
package freeipa-server is not installed
package freeipa-client is not installed
ipa-server-4.5.4-10.el7.centos.4.4.x86_64
ipa-client-4.5.4-10.el7.centos.4.4.x86_64
389-ds-base-1.3.6.1-24.el7_4.x86_64
pki-ca-10.5.1-15.el7_5.noarch
krb5-server-1.15.1-8.el7.x86_64

Additional info:

Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.

Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html
Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting


Hi,
the search

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com dn | grep -c uid
156

returns 156 entries containing uid, but can you check if there are entries with a different naming with

ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -v uid

This way we may be able to find which entry is not replicated and what is its content (otherwise it would mean there is an issue when calculating numsubordinates).

Hi,

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates'
Enter LDAP Password:
dn: cn=users,cn=accounts,dc=dc=example,dc=com
numSubordinates: 144
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -c uid
Enter LDAP Password:
143
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -v uid
Enter LDAP Password:
dn: cn=users,cn=accounts,dc=example,dc=com

[sebastian@ds2 ~]$  ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates'
Enter LDAP Password:
dn: cn=users,cn=accounts,dc=example,dc=com
numSubordinates: 143
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -c uid
Enter LDAP Password:
143
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -v uid
Enter LDAP Password:
dn: cn=users,cn=accounts,dc=example,dc=com

It looks like the calculation of numSubordinates is wrong.

Why not do the same search on both masters piping the output to files and diff the files to see what if anything beyond the subordinate count is different.

@rcritten I've done that. Nothing different. That's why I'm saying that problem is with the counter itself not with missing/not replicated users.

@rcritten checked it even again.

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn > ds1.txt
Enter LDAP Password:
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn > ds2.txt
Enter LDAP Password:
[sebastian@ds1 ~]$ scp ds2:~/ds2.txt .
Password:
ds2.txt                                                                                                                                                                                                                                                                                          100% 8425    67.7KB/s   00:00
[sebastian@ds1 ~]$ diff ds1.txt ds2.txt
[sebastian@ds1 ~]$

I wonder if it could be a tombstone or a non conflict subentry, you may also run

ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no "(|(objectclass=ldapsubentry)(objectclass=nstombstone))"

@tbordaz

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no "(|(objectclass=ldapsubentry)(objectclass=nstombstone))"
Enter LDAP Password:
dn: nsuniqueid=1b3a7198-9e4011e7-b6de960e-171d9189,uid=openvpn,cn=users,cn=accounts,dc=example,dc=com
krbLastSuccessfulAuth: 20190115000928Z
krbLoginFailedCount: 0
krbLastFailedAuth: 20190114204314Z
krbPasswordExpiration: 20190127013042Z
userPassword:: aabbccddee
krbExtraData:: aabbccdd=
krbLastAdminUnlock: 20180730233747Z
krbPrincipalKey:: xxx+yyy/V+zzz+7IUear2PM+qqq/x
krbTicketFlags: 128
krbLastPwdChange: 20180731013042Z
ipaUserAuthType: password
memberOf: cn=admins,cn=groups,cn=accounts,dc=example,dc=com
memberOf: ipaUniqueID=835b7ea6-3533-11e7-b367-00259094efea,cn=hbac,dc=example,dc=com
memberOf: cn=Replication Administrators,cn=privileges,cn=pbac,dc=example,dc=com
memberOf: cn=Add Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Modify Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Read Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Remove Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Modify DNA Range,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Read PassSync Managers Configuration,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Modify PassSync Managers Configuration,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Read LDBM Database Configuration,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Add Configuration Sub-Entries,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Read DNA Range,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=Host Enrollment,cn=privileges,cn=pbac,dc=example,dc=com
memberOf: cn=System: Add krbPrincipalName to a Host,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=System: Enroll a Host,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=System: Manage Host Certificates,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=System: Manage Host Enrollment Password,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=System: Manage Host Keytab,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=System: Manage Host Principals,cn=permissions,cn=pbac,dc=example,dc=com
memberOf: cn=ipausers,cn=groups,cn=accounts,dc=example,dc=com
memberOf: cn=trust admins,cn=groups,cn=accounts,dc=example,dc=com
displayName: OpenVPN BindUser
cn: OpenVPN BindUser
krbCanonicalName: openvpn@EXAMPLE.COM
objectClass: ipaobject
objectClass: person
objectClass: top
objectClass: ipasshuser
objectClass: inetorgperson
objectClass: organizationalperson
objectClass: krbticketpolicyaux
objectClass: krbprincipalaux
objectClass: inetuser
objectClass: posixaccount
objectClass: ipaSshGroupOfPubKeys
objectClass: ipauserauthtypeclass
objectClass: nsTombstone
loginShell: /bin/bash
initials: OB
gidNumber: 4999
gecos: OpenVPN BindUser
sn: BindUser
homeDirectory: /home/openvpn
uid: openvpn
mail: openvpn@examplege.com
krbPrincipalName: openvpn@EXAMPLE.COM
givenName: OpenVPN
ipaUniqueID: 371347d8-9e40-11e7-8223-ac1f6b05ec5c
uidNumber: 7054
nsParentUniqueId: d111e80d-e2d211e6-947fbbac-009391c4
nstombstonecsn: 5c3d97130008001a0000
krbPwdPolicyReference: cn=admins,cn=example.com,cn=kerberos,dc=example,dc=com

[sebastian@ds1 ~]$

Strange, but I got it on both ds1 and ds2 in output.

No idea what happened that can explain the difference of numsubordinates on both servers (a bug in numsubordinate, a replication issue...).

The difference exists in the attribute 'numsubordinates' stored in entry 'cn=users,cn=accounts,dc=example,dc=com'. It would be interesting to know if the difference also exists in the DB index (It should not).

The following steps on both servers would be helpfull to check the index. It is better to run on stopped instances or low traffic. No need to do it at the same time.

ldapsearch -LLL -D "cn=Directory Manager" -W -b "cn=users,cn=accounts,dc=example,dc=com" -s base entryid
Let's assume it returns something like:

--> dn: cn=users,cn=accounts,dc=example,dc=com
--> entryid: 5

dbscan -f /var/lib/dirsrv/slapd-<instance>/db/userRoot/parentid.db -k =5 -r
--> <list of 143/144 IDs> the IDs are specific to the instance
Are the numbers of IDs identical on both servers ?

For each ID 
ldapsearch -LLL -D "cn=Directory Manager" -W -b "dc=example,dc=com" '(&(entryid=<ID>)((objectclass=ldapsubentry)(objectclass=nstombstone)))' dn
You should observe the same set of DNs on both server.

@tbordaz I've tried on both servers and entryid's showed 144 IDs on both.

[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b "cn=users,cn=accounts,dc=example,dc=com" -s base entryid
Enter LDAP Password:
dn: cn=users,cn=accounts,dc=example,dc=com
entryid: 52

[sebastian@ds1 ~]$ sudo dbscan -f /var/lib/dirsrv/slapd-EXAMPLE-COM/db/userRoot/parentid.db -k =52 -r
=52
    434 435 436 437 440 444 445 446 447 448 449 450 454 456 458 459 460 461 462 463 1111 1126 1128 1129 1136 1138 1140 1142 1145 1146 1148 1153 1155 1156 1161 1166 1169 1171 1173 1180 1181 1182 1183 1185 1188 1192 1197 1203 1205 1207 1208 1211 1213 1222 1226 1230 1236 1244 1246 1249 1257 1268 1754 1758 1872 1873 1876 1877 1878 1879 1880 2451 2532 2538 2558 2713 2720 2742 2859 2868 2872 2955 2980 2981 2989 3702 3709 3714 3715 3716 3723 3726 3732 3733 3734 3736 3754 3755 3761 3763 3764 3776 3779 3780 3792 3798 3911 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3953 3956 4104 4142 4143 4148 4149 4150 4157 4165 4169 4170 4211 4248 4389 4432 4456 4458 4460 4472 4476 4477 4523 4527 4546
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' | grep numSubordinates
Enter LDAP Password:
numSubordinates: 144
[sebastian@ds1 ~]$
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b "cn=users,cn=accounts,dc=example,dc=com" -s base entryid
Enter LDAP Password:
dn: cn=users,cn=accounts,dc=example,dc=com
entryid: 76

[sebastian@ds2 ~]$ sudo dbscan -f /var/lib/dirsrv/slapd-EXAMPLE-COM/db/userRoot/parentid.db -k =76 -r
=76
    455 456 457 458 461 465 466 467 468 469 470 471 473 474 476 477 478 479 480 481 484 485 486 487 489 490 491 493 496 497 499 504 506 507 511 515 518 520 522 528 529 530 531 533 536 540 545 551 553 555 556 559 561 569 573 577 583 591 593 596 602 604 609 611 613 614 617 618 619 620 621 2062 2144 2145 2169 2324 2331 2353 2471 2480 2484 2567 2591 2592 2600 3313 3320 3325 3326 3327 3334 3337 3343 3344 3345 3347 3365 3366 3372 3374 3375 3387 3390 3391 3403 3409 3522 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3564 3567 3715 3753 3754 3759 3760 3761 3768 3776 3780 3781 3822 3859 4000 4043 4067 4069 4071 4083 4087 4088 4134 4138 4157
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' | grep numSubordinates
Enter LDAP Password:
numSubordinates: 143
[sebastian@ds2 ~]$

Tried the last ldapsearch from Your example, but it shows me:

root@ds2:~ # ldapsearch -LLL -D "cn=Directory Manager" -W -b "dc=example,dc=com" '(&(entryid=455)((objectclass=ldapsubentry)(objectclass=nstombstone)))' dn
Enter LDAP Password:
ldap_search_ext: Bad search filter (-7)

@greyer , thanks for the data. It is somehow showing we have a bug in the way numsubordinates is computed :(. Will be back on this later

Regarding the filter, bad cut/paste it was missing the OR ''(&(entryid=455)(|(objectclass=ldapsubentry)(objectclass=nstombstone)))'

@tbordaz later means when? ;-)

[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b "dc=example,dc=com" '(&(entryid=456)(|(objectclass=ldapsubentry)(objectclass=nstombstone)))' dn
Enter LDAP Password:
[sebastian@ds2 ~]$

Strange, but it looks like it's empty?

@greyer sorry for the late answer I had difficulties to clarify the status of numsubordinates.
In 1.3.7 (7.5) a RFE did major changes in conflict handling. It changed code of numsubordinates handling and IIRC it revealed an existing bug in the way numsubordinates handled tombstones.
The RFE was https://pagure.io/389-ds-base/issue/49551.
So I think there is a good chance that the bug you are seeing in fixed in 1.3.7. At least it worth to upgrade.

How to repair the broken numsubordinates... I only see a total init solution :(

The missing entry 456 on ds2, I have no idea why this entry is hidden.
You can retrieve that entry (during low trafic or stopped instance) with

dbscan -f /var/lib/dirsrv/slapd-<instance>/db/userRoot/id2entry.db -K 456

@tbordaz I have just upgraded 389-ds-base to 1.3.8, it still shows different numsubordinates on ds1 and ds2.

@greyer I forgot to answer you :(

The problem is that numsubordinates was incorrectly computed and then stored into the DB and will stay like this unless you reinitialize.

@tbordaz could you shed some light on what that actually means, "reinitialize"?

I have an identical situation with an incorrect numSubordinates value on one IPA master

@keesghs, sorry to read that you also hit that bug. Which version are you running ?
By any chance did you identify a reproducible scenario ?

If you are in a replicated topology, the only way to recover from that bug is to do a total initialization (e.g. ipa-replica-manage re-initialize --from fqdn_good_instance)
If this is a standalone instance, I am afraid you need to import from a previous export.

@tbordaz no, I don't have an exact scenario to reproduce. What happened is the following.
We had an IPA master (A) on a temporary system (a desktop). We added two replicas (B and C), replication was A<->B and A<->C.
Next we promoted B to be the CA master, and configured the IPA users (desktops, servers, etc) to use B as the first choice IPA master. (Think of LDAP and such). After a few days we switched off A to see if everything was covered.

What we forgot to check was: replication. We thought it was alright and we deleted A.

Next, we discovered the replication issue, and we fixed it, connecting C<->B. It all seemed to be OK. However, there was one new user which, by accident was added on B and C. After solving the replication conflict we were stuck with the incorrect numSubordinates. I remember I had to unlink an LDAP entry to actually delete a conflicting private group of that new user.

Conclusion. Not an exact reproducible scenario, but a rough description of what happened.

@tbordaz So, yes, we have a replicated topology. However, I'm scared as hell to execute commands that messes around with master B. The reason is that we had a nasty experience last year where we could not get our certificates renewed. We completely started fresh with a new IPA installation. I don't want to go through that again.

DS is robust on regular operations but I have to admit that admin tasks are always critical.
I see no other option to recover from numsubordinates issue than a total init. May be you can introduce new instances and when they are up and running, deprecate those having the issue.
Knowing that https://pagure.io/389-ds-base/issue/49551 is now fixed upstream.

Login to comment on this ticket.

Metadata