#7165 Intermittent "kinit admin" failures in upstream tests
Closed 5 years ago Opened 6 years ago by mreznik.

Recently we started seeing Intermittent and unpredictable "kinit admin" failures in upstream tests (caless, simple_replication, forced_client_reenrollment).

After catching the kinit with KRB5_TRACE we however only see that KDC is not replying:

[30469] 1506084034.316824: Getting initial credentials for admin@IPA.TEST
[30469] 1506084034.316826: Sending request (225 bytes) to IPA.TEST
[30469] 1506084034.316827: Initiating TCP connection to stream 10.0.78.24:88
[30469] 1506084034.316828: Sending TCP request to stream 10.0.78.24:88
[30469] 1506084044.331383: Sending initial UDP request to dgram 10.0.78.24:88
[30469] 1506084047.334651: Sending retry UDP request to dgram 10.0.78.24:88
[30469] 1506084052.339981: Sending retry UDP request to dgram 10.0.78.24:88
[30469] 1506084061.342888: Terminating TCP connection to stream 10.0.78.24:88
kinit: Cannot contact any KDC for realm 'IPA.TEST' while getting initial credentials

In this case kinit was run on master (10.0.78.24).

Interestingly enough kinit (even from master) was previously working as we see from pytest workflow:

[ipatests.pytest_plugins.integration.host.Host.replica.ParamikoTransport] RUN hostname > '/root/ipatests/backup_hostname'
[ipatests.pytest_plugins.integration.host.Host.master.ParamikoTransport] RUN ['kinit', 'admin']
[ipatests.pytest_plugins.integration.host.Host.master.ParamikoTransport] RUN ['ipa', 'dnsconfig-mod', '--allow-sync-ptr=true']
[ipatests.pytest_plugins.integration.host.Host.master.ParamikoTransport] RUN ['ipa', 'dnszone-add', '58.0.10.in-addr.arpa.']
ipa: WARNING: ipa: ERROR: DNS zone 58.0.10.in-addr.arpa. already exists in DNS and is handled by server(s): albina.ipa.test., hector.ipa.test.

[ipatests.pytest_plugins.integration.tasks] ipa: ERROR: DNS zone 58.0.10.in-addr.arpa. already exists in DNS and is handled by server(s): albina.ipa.test., hector.ipa.test.

[ipatests.pytest_plugins.integration.host.Host.replica.ParamikoTransport] RUN ['ipa-client-install', '-U', '--domain', 'IPA.TEST', '--realm', 
...
<snip>
...
[ipatests.pytest_plugins.integration.host.Host.replica.ParamikoTransport] RUN ['ipa-replica-install', '-p', 'Secret123', '-w', 'Secret123', '-U', '--ip-address', '10.0.58.18', '-r', 'IPA.TEST']
...
<snip>
...
[ipatests.pytest_plugins.integration.host.Host.replica.ParamikoTransport] RUN ['kinit', 'admin']
[ipatests.pytest_plugins.integration] Adding 10.0.58.178:/var/log/ipaclient-install.log to list of logs to collect
[ipatests.pytest_plugins.integration.host.Host.client.ParamikoTransport] RUN ['true']
[ipatests.pytest_plugins.integration] Adding 10.0.58.178:/root/ipatests/env.sh to list of logs to collect
[ipatests.pytest_plugins.integration.host.Host.client.ParamikoTransport] WRITE /root/ipatests/env.sh
[ipatests.pytest_plugins.integration.host.Host.client.ParamikoTransport] RUN echo '/etc/hostname' >> '/root/ipatests/file_remove'
[ipatests.pytest_plugins.integration.host.Host.client.ParamikoTransport] WRITE /etc/hostname
[ipatests.pytest_plugins.integration.host.Host.client.ParamikoTransport] RUN ['hostname', 'client.ipa.test']
[ipatests.pytest_plugins.integration.host.Host.client.ParamikoTransport] RUN hostname > '/root/ipatests/backup_hostname'
[ipatests.pytest_plugins.integration.host.Host.master.ParamikoTransport] RUN ['kinit', 'admin']
ipa: ERROR: Exit code: 1
[ipatests.pytest_plugins.integration.host.Host.master.cmd30] Exit code: 1
ERROR

And after some time (not sure about amount) it starts working again without any action. Firewall was stopped and selinux in permissive mode.

The test was run on Fedora 26.

[root@master ~]# ipa ping
------------------------------------------------------------------------
IPA server version 4.6.90.dev201709211755+git30a2b7e5. API version 2.229
------------------------------------------------------------------------

The timestamp when kinit started was:

[root@master ~]# date --date='@1506084034.316824'
Fri Sep 22 14:40:34 CEST 2017

Around this time in dirsrv/access log we see a lot of the following messages (those are seen throughout the logs but not as much as here) :

[22/Sep/2017:14:40:27.273178861 +0200] conn=124 op=4 UNBIND
[22/Sep/2017:14:40:27.273216476 +0200] conn=124 op=4 fd=101 closed - U1
[22/Sep/2017:14:40:30.164349330 +0200] conn=116 op=20 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[22/Sep/2017:14:40:30.180564726 +0200] conn=116 op=20 RESULT err=0 tag=120 nentries=0 etime=0.0016341984
[22/Sep/2017:14:40:30.236568631 +0200] conn=116 op=21 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
[22/Sep/2017:14:40:30.243925946 +0200] conn=116 op=21 RESULT err=0 tag=120 nentries=0 etime=0.0061780127
[22/Sep/2017:14:40:33.264810415 +0200] conn=116 op=23 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[22/Sep/2017:14:40:33.288889586 +0200] conn=116 op=23 RESULT err=0 tag=120 nentries=0 etime=0.0024231984
[22/Sep/2017:14:40:33.420124587 +0200] conn=116 op=24 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
[22/Sep/2017:14:40:33.429132596 +0200] conn=116 op=24 RESULT err=0 tag=120 nentries=0 etime=0.0009151826
[22/Sep/2017:14:40:36.452282897 +0200] conn=116 op=25 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[22/Sep/2017:14:40:36.465679138 +0200] conn=116 op=25 RESULT err=0 tag=120 nentries=0 etime=0.0013508838

Not sure how much this can be related though. KRB5KDC logs unfortunately for some reason started logging later.


Encountered the same issue again and after some time KRB5_TRACE=/dev/stderr kinit admin shows:

[root@master ~]# KRB5_TRACE=/dev/stderr kinit admin
[26968] 1506440593.953138: Getting initial credentials for admin@IPA.TEST
[26968] 1506440593.953140: Sending request (233 bytes) to IPA.TEST
[26968] 1506440593.953141: Resolving hostname master.ipa,test
[26968] 1506440593.953142: Initiating TCP connection to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[26968] 1506440593.953143: Sending TCP request to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[26968] 1506440593.953144: Received answer (341 bytes) from stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[26968] 1506440593.953145: Terminating TCP connection to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[26968] 1506440593.953146: Response was from master KDC
[26968] 1506440593.953147: Received error from KDC: -1765328359/Additional pre-authentication required
[26968] 1506440593.953150: Processing preauth types: 16, 15, 14, 136, 147, 133
[26968] 1506440593.953151: Received cookie: MIT
[26968] 1506440593.953152: PKINIT client has no configured identity; giving up
[26968] 1506440593.953153: Preauth module pkinit (147) (info) returned: 0/Success
[26968] 1506440593.953154: PKINIT client has no configured identity; giving up
[26968] 1506440593.953155: Preauth module pkinit (16) (real) returned: 22/Invalid argument
[26968] 1506440593.953156: PKINIT client has no configured identity; giving up
[26968] 1506440593.953157: Preauth module pkinit (14) (real) returned: 22/Invalid argument
[26968] 1506440593.953158: PKINIT client has no configured identity; giving up
[26968] 1506440593.953159: Preauth module pkinit (14) (real) returned: 22/Invalid argument
kinit: Pre-authentication failed: Invalid argument while getting initial credentials

It looks like whenever this issue happens it somehow recovers itself to certain extent. Sometimes we are even able to kinit successfully.

Processing preauth types: 16, 15, 14, 136, 147, 133

14,15,16 and 147 are PKINIT and 133 and 136 are FAST related.

There is no pre-authentication type related to password based authentication is this expected, i.e. does to admin account has no password or disabled password authentication?

Are there any message in krb5kdc.log on the server?

Encountered the same issue today:

[root@vm-058-054 ~]# KRB5_TRACE=/dev/stderr kinit admin
[13508] 1506501995.978839: Getting initial credentials for admin@DOM-058-054.ABC.IDM.LAB.ENG.BRQ.REDHAT.COM
[13508] 1506501995.978841: Sending request (233 bytes) to DOM-058-054.ABC.IDM.LAB.ENG.BRQ.REDHAT.COM
[13508] 1506501995.978842: Resolving hostname vm-058-054.abc.idm.lab.eng.brq.redhat.com
[13508] 1506501995.978843: Initiating TCP connection to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[13508] 1506501995.978844: Sending TCP request to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[13508] 1506501996.1918: Received answer (295 bytes) from stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[13508] 1506501996.1919: Terminating TCP connection to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[13508] 1506501996.1920: Response was from master KDC
[13508] 1506501996.1921: Received error from KDC: -1765328359/Additional pre-authentication required
[13508] 1506501996.1924: Processing preauth types: 136, 133
[13508] 1506501996.1925: Received cookie: MIT
kinit: Generic preauthentication failure while getting initial credentials

from krb5kdc.log

Sep 27 10:45:38 vm-058-054.abc.idm.lab.eng.brq.redhat.com krb5kdc13154: preauth pkinit failed to initialize: No realms configured correctly for pkinit support

I think the pkinit message in krb5kdc.log is not related. It just explain why you do not see the pkinit related preauth types. But still the KDC thinks that the admin account does not have a password.

Can you check the dirsrv logs from this time 'Sep 27 10:46:36' as well?

Sorry, don't have these logs anymore, that master machine is using for all trust related tests.
Can you tell me what logs you'll need, so next time I hit issue to provide all together ?

The part from /var/log/dirsrv/slapd-YOUR.DOMAIN/access around the time of authentication to see what requests are send from the KDC to lookup the user.

Sep 27 15:10:03 vm-058-054.abc.idm.lab.eng.brq.redhat.com krb5kdc13884: preauth pkinit failed to initialize: No realms configured correctly for pkinit support

[root@vm-058-054 ~]# KRB5_TRACE=/dev/stderr kinit admin
[14117] 1506517815.477502: Getting initial credentials for admin@DOM-058-054.ABC.IDM.LAB.ENG.BRQ.REDHAT.COM
[14117] 1506517815.477504: Sending request (233 bytes) to DOM-058-054.ABC.IDM.LAB.ENG.BRQ.REDHAT.COM
[14117] 1506517815.477505: Resolving hostname vm-058-054.abc.idm.lab.eng.brq.redhat.com
[14117] 1506517815.477506: Initiating TCP connection to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[14117] 1506517815.477507: Sending TCP request to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[14117] 1506517815.477508: Received answer (296 bytes) from stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[14117] 1506517815.477509: Terminating TCP connection to stream 2620:52:0:224e:21a:4aff:fe23:1491:88
[14117] 1506517815.477510: Response was from master KDC
[14117] 1506517815.477511: Received error from KDC: -1765328359/Additional pre-authentication required
[14117] 1506517815.477514: Processing preauth types: 136, 133
[14117] 1506517815.477515: Received cookie: MIT
kinit: Generic preauthentication failure while getting initial credentials

dirsrv.log

So far I have no idea. Just to be sure, when this happens the admin user has a password and user-auth-type is either not set or includes 'password'.

Should the coma be there?

[26968] 1506440593.953141: Resolving hostname master.ipa,test

I checked that the SSSD patch disabling the locator plugin on IPA servers was so far not included in any Fedora version. So there is no need to test with an older SSSD version.

@sbose We're using sssd 1.15.3-4.fc27 from our copr at https://copr.fedorainfracloud.org/coprs/g/freeipa/freeipa-master/packages/. Not sure if it was already released to any Fedora, is the patch also not included in that version?

@stlaz, no the patch is not included here. The copr packages are based on the Fedora ones (https://kojipkgs.fedoraproject.org//packages/sssd/1.15.3/4.fc27/src/sssd-1.15.3-4.fc27.src.rpm).

I checked the debugsource packages as well to be on the safe side.

What is needed here is consistent logs,
when the error happen the relevant logs from dirsrv (both acces and error logs) and from the kdc (krb5kdc.log) must be provided.
Please also provide information about the hosts configuration (host name, host ip, /etc/hosts file, krb5.conf file)

Related 389-ds ticket https://pagure.io/389-ds-base/issue/49410

ATM the culprit is likely in 389-ds

Metadata Update from @pvoborni:
- Issue priority set to: critical

6 years ago

Metadata Update from @pvoborni:
- Issue set to the milestone: FreeIPA 4.5.4

6 years ago

Metadata Update from @mreznik:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1508053

6 years ago

Was fixed in DS. Should we update requires?

Metadata Update from @pvoborni:
- Issue tagged with: tracker

6 years ago

master:

  • 4daac52 ipaldap: allow GetEffectiveRights on individual operations
  • b466172 ldap2: fix implementation of can_add

ipa-4-6:

  • ac9efdc ipaldap: allow GetEffectiveRights on individual operations
  • dee3255 ldap2: fix implementation of can_add

@pvoborni This bug is still marked as critical. Do we still need to follow your suggestion and update requirements in 4.5 and 4.6?

Metadata Update from @mreznik:
- Issue priority set to: low (was: critical)

5 years ago

Metadata Update from @mreznik:
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata
Attachments 4
Attached 6 years ago View Comment
Attached 6 years ago View Comment
Attached 6 years ago View Comment
Attached 6 years ago View Comment