I have 2 masters and multiple replicas cluster. On the first master Active Users count is broken even though the count of users is proper.
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' dn: cn=users,cn=accounts,dc=example,dc=com numSubordinates: 157 [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' dn: cn=users,cn=accounts,dc=example,dc=com numSubordinates: 156
I have checked multiple times and there are no unresolved replication conflicts.
[sebastian@ds1 ~]$ ldapsearch -D "cn=Directory Manager" -W "(&(objectClass=ldapSubEntry)(nsds5ReplConflict=*))" \* nsds5ReplConflict # extended LDIF # # LDAPv3 # base <dc=drawbrid,dc=ge> (default) with scope subtree # filter: (&(objectClass=ldapSubEntry)(nsds5ReplConflict=*)) # requesting: * nsds5ReplConflict # # search result search: 2 result: 0 Success # numResponses: 1 [sebastian@ds1 ~]$ ldapsearch -x -b "cn=mapping tree,cn=config" -D "cn=Directory Manager" -W objectClass=nsDS5ReplicationAgreement -LL | grep "nsds5replicaLastUpdateStatus" nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme nsds5replicaLastUpdateStatus: Error (0) Replica acquired successfully: Increme
What is odd, checking the number of entries show correct number
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com dn | grep -c uid 156 [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com dn | grep -c uid 156
Can't reproduce it.
On one master I have count of numSubordinates for users higher by one then on other master or replicas.
numSubordinates should be the same on all hosts.
All systems are running CentOS 7.
[sebastian@ds1 ~]$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server package freeipa-server is not installed package freeipa-client is not installed ipa-server-4.5.4-10.el7.centos.4.4.x86_64 ipa-client-4.5.4-10.el7.centos.4.4.x86_64 389-ds-base-1.3.6.1-24.el7_4.x86_64 pki-ca-10.5.1-15.el7_5.noarch krb5-server-1.15.1-8.el7.x86_64 [sebastian@ds2 ~]$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server package freeipa-server is not installed package freeipa-client is not installed ipa-server-4.5.4-10.el7.centos.4.4.x86_64 ipa-client-4.5.4-10.el7.centos.4.4.x86_64 389-ds-base-1.3.6.1-24.el7_4.x86_64 pki-ca-10.5.1-15.el7_5.noarch krb5-server-1.15.1-8.el7.x86_64
Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.
Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting
Hi, the search
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com dn | grep -c uid 156
returns 156 entries containing uid, but can you check if there are entries with a different naming with
ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -v uid
This way we may be able to find which entry is not replicated and what is its content (otherwise it would mean there is an issue when calculating numsubordinates).
Hi,
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' Enter LDAP Password: dn: cn=users,cn=accounts,dc=dc=example,dc=com numSubordinates: 144 [sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -c uid Enter LDAP Password: 143 [sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -v uid Enter LDAP Password: dn: cn=users,cn=accounts,dc=example,dc=com [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' Enter LDAP Password: dn: cn=users,cn=accounts,dc=example,dc=com numSubordinates: 143 [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -c uid Enter LDAP Password: 143 [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn | grep -v uid Enter LDAP Password: dn: cn=users,cn=accounts,dc=example,dc=com
It looks like the calculation of numSubordinates is wrong.
Why not do the same search on both masters piping the output to files and diff the files to see what if anything beyond the subordinate count is different.
@rcritten I've done that. Nothing different. That's why I'm saying that problem is with the counter itself not with missing/not replicated users.
@rcritten checked it even again.
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn > ds1.txt Enter LDAP Password: [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no dn > ds2.txt Enter LDAP Password: [sebastian@ds1 ~]$ scp ds2:~/ds2.txt . Password: ds2.txt 100% 8425 67.7KB/s 00:00 [sebastian@ds1 ~]$ diff ds1.txt ds2.txt [sebastian@ds1 ~]$
I wonder if it could be a tombstone or a non conflict subentry, you may also run
ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no "(|(objectclass=ldapsubentry)(objectclass=nstombstone))"
@tbordaz
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -o ldif-wrap=no "(|(objectclass=ldapsubentry)(objectclass=nstombstone))" Enter LDAP Password: dn: nsuniqueid=1b3a7198-9e4011e7-b6de960e-171d9189,uid=openvpn,cn=users,cn=accounts,dc=example,dc=com krbLastSuccessfulAuth: 20190115000928Z krbLoginFailedCount: 0 krbLastFailedAuth: 20190114204314Z krbPasswordExpiration: 20190127013042Z userPassword:: aabbccddee krbExtraData:: aabbccdd= krbLastAdminUnlock: 20180730233747Z krbPrincipalKey:: xxx+yyy/V+zzz+7IUear2PM+qqq/x krbTicketFlags: 128 krbLastPwdChange: 20180731013042Z ipaUserAuthType: password memberOf: cn=admins,cn=groups,cn=accounts,dc=example,dc=com memberOf: ipaUniqueID=835b7ea6-3533-11e7-b367-00259094efea,cn=hbac,dc=example,dc=com memberOf: cn=Replication Administrators,cn=privileges,cn=pbac,dc=example,dc=com memberOf: cn=Add Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Modify Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Read Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Remove Replication Agreements,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Modify DNA Range,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Read PassSync Managers Configuration,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Modify PassSync Managers Configuration,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Read LDBM Database Configuration,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Add Configuration Sub-Entries,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Read DNA Range,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=Host Enrollment,cn=privileges,cn=pbac,dc=example,dc=com memberOf: cn=System: Add krbPrincipalName to a Host,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=System: Enroll a Host,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=System: Manage Host Certificates,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=System: Manage Host Enrollment Password,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=System: Manage Host Keytab,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=System: Manage Host Principals,cn=permissions,cn=pbac,dc=example,dc=com memberOf: cn=ipausers,cn=groups,cn=accounts,dc=example,dc=com memberOf: cn=trust admins,cn=groups,cn=accounts,dc=example,dc=com displayName: OpenVPN BindUser cn: OpenVPN BindUser krbCanonicalName: openvpn@EXAMPLE.COM objectClass: ipaobject objectClass: person objectClass: top objectClass: ipasshuser objectClass: inetorgperson objectClass: organizationalperson objectClass: krbticketpolicyaux objectClass: krbprincipalaux objectClass: inetuser objectClass: posixaccount objectClass: ipaSshGroupOfPubKeys objectClass: ipauserauthtypeclass objectClass: nsTombstone loginShell: /bin/bash initials: OB gidNumber: 4999 gecos: OpenVPN BindUser sn: BindUser homeDirectory: /home/openvpn uid: openvpn mail: openvpn@examplege.com krbPrincipalName: openvpn@EXAMPLE.COM givenName: OpenVPN ipaUniqueID: 371347d8-9e40-11e7-8223-ac1f6b05ec5c uidNumber: 7054 nsParentUniqueId: d111e80d-e2d211e6-947fbbac-009391c4 nstombstonecsn: 5c3d97130008001a0000 krbPwdPolicyReference: cn=admins,cn=example.com,cn=kerberos,dc=example,dc=com [sebastian@ds1 ~]$
Strange, but I got it on both ds1 and ds2 in output.
No idea what happened that can explain the difference of numsubordinates on both servers (a bug in numsubordinate, a replication issue...).
The difference exists in the attribute 'numsubordinates' stored in entry 'cn=users,cn=accounts,dc=example,dc=com'. It would be interesting to know if the difference also exists in the DB index (It should not).
The following steps on both servers would be helpfull to check the index. It is better to run on stopped instances or low traffic. No need to do it at the same time.
ldapsearch -LLL -D "cn=Directory Manager" -W -b "cn=users,cn=accounts,dc=example,dc=com" -s base entryid Let's assume it returns something like: --> dn: cn=users,cn=accounts,dc=example,dc=com --> entryid: 5 dbscan -f /var/lib/dirsrv/slapd-<instance>/db/userRoot/parentid.db -k =5 -r --> <list of 143/144 IDs> the IDs are specific to the instance Are the numbers of IDs identical on both servers ? For each ID ldapsearch -LLL -D "cn=Directory Manager" -W -b "dc=example,dc=com" '(&(entryid=<ID>)((objectclass=ldapsubentry)(objectclass=nstombstone)))' dn You should observe the same set of DNs on both server.
@tbordaz I've tried on both servers and entryid's showed 144 IDs on both.
[sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b "cn=users,cn=accounts,dc=example,dc=com" -s base entryid Enter LDAP Password: dn: cn=users,cn=accounts,dc=example,dc=com entryid: 52 [sebastian@ds1 ~]$ sudo dbscan -f /var/lib/dirsrv/slapd-EXAMPLE-COM/db/userRoot/parentid.db -k =52 -r =52 434 435 436 437 440 444 445 446 447 448 449 450 454 456 458 459 460 461 462 463 1111 1126 1128 1129 1136 1138 1140 1142 1145 1146 1148 1153 1155 1156 1161 1166 1169 1171 1173 1180 1181 1182 1183 1185 1188 1192 1197 1203 1205 1207 1208 1211 1213 1222 1226 1230 1236 1244 1246 1249 1257 1268 1754 1758 1872 1873 1876 1877 1878 1879 1880 2451 2532 2538 2558 2713 2720 2742 2859 2868 2872 2955 2980 2981 2989 3702 3709 3714 3715 3716 3723 3726 3732 3733 3734 3736 3754 3755 3761 3763 3764 3776 3779 3780 3792 3798 3911 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3953 3956 4104 4142 4143 4148 4149 4150 4157 4165 4169 4170 4211 4248 4389 4432 4456 4458 4460 4472 4476 4477 4523 4527 4546 [sebastian@ds1 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' | grep numSubordinates Enter LDAP Password: numSubordinates: 144 [sebastian@ds1 ~]$
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b "cn=users,cn=accounts,dc=example,dc=com" -s base entryid Enter LDAP Password: dn: cn=users,cn=accounts,dc=example,dc=com entryid: 76 [sebastian@ds2 ~]$ sudo dbscan -f /var/lib/dirsrv/slapd-EXAMPLE-COM/db/userRoot/parentid.db -k =76 -r =76 455 456 457 458 461 465 466 467 468 469 470 471 473 474 476 477 478 479 480 481 484 485 486 487 489 490 491 493 496 497 499 504 506 507 511 515 518 520 522 528 529 530 531 533 536 540 545 551 553 555 556 559 561 569 573 577 583 591 593 596 602 604 609 611 613 614 617 618 619 620 621 2062 2144 2145 2169 2324 2331 2353 2471 2480 2484 2567 2591 2592 2600 3313 3320 3325 3326 3327 3334 3337 3343 3344 3345 3347 3365 3366 3372 3374 3375 3387 3390 3391 3403 3409 3522 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3564 3567 3715 3753 3754 3759 3760 3761 3768 3776 3780 3781 3822 3859 4000 4043 4067 4069 4071 4083 4087 4088 4134 4138 4157 [sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b cn=users,cn=accounts,dc=example,dc=com -s base 'numSubordinates' | grep numSubordinates Enter LDAP Password: numSubordinates: 143 [sebastian@ds2 ~]$
Tried the last ldapsearch from Your example, but it shows me:
root@ds2:~ # ldapsearch -LLL -D "cn=Directory Manager" -W -b "dc=example,dc=com" '(&(entryid=455)((objectclass=ldapsubentry)(objectclass=nstombstone)))' dn Enter LDAP Password: ldap_search_ext: Bad search filter (-7)
@greyer , thanks for the data. It is somehow showing we have a bug in the way numsubordinates is computed :(. Will be back on this later
Regarding the filter, bad cut/paste it was missing the OR ''(&(entryid=455)(|(objectclass=ldapsubentry)(objectclass=nstombstone)))'
@tbordaz later means when? ;-)
[sebastian@ds2 ~]$ ldapsearch -LLL -D "cn=Directory Manager" -W -b "dc=example,dc=com" '(&(entryid=456)(|(objectclass=ldapsubentry)(objectclass=nstombstone)))' dn Enter LDAP Password: [sebastian@ds2 ~]$
Strange, but it looks like it's empty?
@greyer sorry for the late answer I had difficulties to clarify the status of numsubordinates. In 1.3.7 (7.5) a RFE did major changes in conflict handling. It changed code of numsubordinates handling and IIRC it revealed an existing bug in the way numsubordinates handled tombstones. The RFE was https://pagure.io/389-ds-base/issue/49551. So I think there is a good chance that the bug you are seeing in fixed in 1.3.7. At least it worth to upgrade.
How to repair the broken numsubordinates... I only see a total init solution :(
The missing entry 456 on ds2, I have no idea why this entry is hidden. You can retrieve that entry (during low trafic or stopped instance) with
dbscan -f /var/lib/dirsrv/slapd-<instance>/db/userRoot/id2entry.db -K 456
@tbordaz I have just upgraded 389-ds-base to 1.3.8, it still shows different numsubordinates on ds1 and ds2.
@greyer I forgot to answer you :(
The problem is that numsubordinates was incorrectly computed and then stored into the DB and will stay like this unless you reinitialize.
@tbordaz could you shed some light on what that actually means, "reinitialize"?
I have an identical situation with an incorrect numSubordinates value on one IPA master
numSubordinates
@keesghs, sorry to read that you also hit that bug. Which version are you running ? By any chance did you identify a reproducible scenario ?
If you are in a replicated topology, the only way to recover from that bug is to do a total initialization (e.g. ipa-replica-manage re-initialize --from fqdn_good_instance) If this is a standalone instance, I am afraid you need to import from a previous export.
@tbordaz no, I don't have an exact scenario to reproduce. What happened is the following. We had an IPA master (A) on a temporary system (a desktop). We added two replicas (B and C), replication was A<->B and A<->C. Next we promoted B to be the CA master, and configured the IPA users (desktops, servers, etc) to use B as the first choice IPA master. (Think of LDAP and such). After a few days we switched off A to see if everything was covered.
What we forgot to check was: replication. We thought it was alright and we deleted A.
Next, we discovered the replication issue, and we fixed it, connecting C<->B. It all seemed to be OK. However, there was one new user which, by accident was added on B and C. After solving the replication conflict we were stuck with the incorrect numSubordinates. I remember I had to unlink an LDAP entry to actually delete a conflicting private group of that new user.
Conclusion. Not an exact reproducible scenario, but a rough description of what happened.
@tbordaz So, yes, we have a replicated topology. However, I'm scared as hell to execute commands that messes around with master B. The reason is that we had a nasty experience last year where we could not get our certificates renewed. We completely started fresh with a new IPA installation. I don't want to go through that again.
DS is robust on regular operations but I have to admit that admin tasks are always critical. I see no other option to recover from numsubordinates issue than a total init. May be you can introduce new instances and when they are up and running, deprecate those having the issue. Knowing that https://pagure.io/389-ds-base/issue/49551 is now fixed upstream.
Login to comment on this ticket.