#5653 Multiple clients cannot join domain simultaneously: /var/run/httpd/ipa/clientcaches race condition?
Closed: Fixed None Opened 4 years ago by pspacek.

https://www.redhat.com/archives/freeipa-users/2016-January/msg00559.html

Use case:
I'm in a cloud environment and launching a group of instances which are all trying to join a domain at about the same time via ipa-client-install.

Problem:
Some of these operations succeed, and others fail.

The error message on those that fail is that they failed to join the domain, and the HTTP response was 500 instead of 200.

Relevant part of the log:

] admin@FOO.INTERNAL: dnsrecord_find(<DNS name 11.10.in-addr.arpa.>, None, idnsname=<DNS name 74.134>, structured=False, all=False, raw=False, version=u'2.156', pkey_only=Fa
] admin@FOO.INTERNAL: dnsrecord_add(<DNS name 11.10.in-addr.arpa.>, <DNS name 74.134>, a_extra_create_reverse=False, aaaa_extra_create_reverse=False, ptrrecord=(u'journalnod
] admin@FOO.INTERNAL: dnsrecord_find(<DNS name foo.internal.>, None, idnsname=<DNS name journalnode1.117b0c7ad4a26a83>, structured=False, all=False, raw=False, version=u'2.1
] admin@FOO.INTERNAL: host_find(None, fqdn=u'journalnode3.117b0c7ad4a26a83.foo.internal', all=False, raw=False, version=u'2.156', no_members=False, pkey_only=False): SUCCESS
] admin@FOO.INTERNAL: dnsrecord_find(<DNS name foo.internal.>, None, idnsname=<DNS name journalnode2.117b0c7ad4a26a83>, structured=False, all=False, raw=False, version=u'2.1
mod_wsgi (pid=9534): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'.
Traceback (most recent call last):
  File "/usr/share/ipa/wsgi.py", line 49, in application
    return api.Backend.wsgi_dispatch(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 258, in __call__
    return self.route(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 270, in route
    return app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 447, in __call__
    response = super(jsonserver, self).__call__(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 647, in __call__
    'xmlserver', user_ccache, environ, start_response, headers)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 593, in finalize_kerberos_acquisition
    session_data['ccache_data'] = load_ccache_data(ccache_name)
  File "/usr/lib/python2.7/site-packages/ipalib/session.py", line 1231, in load_ccache_data
    src = open(name)
IOError: [Errno 2] No such file or directory: '/var/run/httpd/ipa/clientcaches/admin@FOO.INTERNAL'
] admin@FOO.INTERNAL: dnsrecord_add(<DNS name foo.internal.>, <DNS name journalnode1.117b0c7ad4a26a83>, arecord=(u'10.11.135.27',), a_extra_create_reverse=False, aaaa_extra_
] admin@FOO.INTERNAL: host_add(u'journalnode3.117b0c7ad4a26a83.foo.internal', random=False, force=True, no_reverse=False, all=False, raw=False, version=u'2.156', no_members=
] admin@FOO.INTERNAL: dnsrecord_add(<DNS name foo.internal.>, <DNS name journalnode2.117b0c7ad4a26a83>, arecord=(u'10.11.135.180',), a_extra_create_reverse=False, aaaa_extra
mod_wsgi (pid=9535): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'.
Traceback (most recent call last):
  File "/usr/share/ipa/wsgi.py", line 49, in application
    return api.Backend.wsgi_dispatch(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 258, in __call__
    return self.route(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 270, in route
    return app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 447, in __call__
    response = super(jsonserver, self).__call__(environ, start_response)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 647, in __call__
    'xmlserver', user_ccache, environ, start_response, headers)
  File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 593, in finalize_kerberos_acquisition
    session_data['ccache_data'] = load_ccache_data(ccache_name)
  File "/usr/lib/python2.7/site-packages/ipalib/session.py", line 1231, in load_ccache_data
    src = open(name)
IOError: [Errno 2] No such file or directory: '/var/run/httpd/ipa/clientcaches/admin@FOO.INTERNAL'

A similar issue was reported also when adding concurrently 25 user at a time during bulk enrollment of 150,000 user to a single RHEL7 server

[Fri Feb 19 07:41:08.100903 2016] [:error] [pid 76505] [remote 10.0.1.177:40] mod_wsgi (pid=76505): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'.
[Fri Feb 19 07:41:08.100989 2016] [:error] [pid 76505] [remote 10.0.1.177:40] Traceback (most recent call last):
[Fri Feb 19 07:41:08.101018 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/share/ipa/wsgi.py", line 49, in application
[Fri Feb 19 07:41:08.101073 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     return api.Backend.wsgi_dispatch(environ, start_response)
[Fri Feb 19 07:41:08.101087 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 258, in __call__
[Fri Feb 19 07:41:08.101109 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     return self.route(environ, start_response)
[Fri Feb 19 07:41:08.101120 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 270, in route
[Fri Feb 19 07:41:08.101140 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     return app(environ, start_response)
[Fri Feb 19 07:41:08.101152 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 447, in __call__
[Fri Feb 19 07:41:08.101169 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     response = super(jsonserver, self).__call__(environ, start_response)
[Fri Feb 19 07:41:08.101180 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 647, in __call__
[Fri Feb 19 07:41:08.101198 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     'xmlserver', user_ccache, environ, start_response, headers)
[Fri Feb 19 07:41:08.101210 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 593, in finalize_kerberos_acquisition
[Fri Feb 19 07:41:08.101229 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     session_data['ccache_data'] = load_ccache_data(ccache_name)
[Fri Feb 19 07:41:08.101240 2016] [:error] [pid 76505] [remote 10.0.1.177:40]   File "/usr/lib/python2.7/site-packages/ipalib/session.py", line 1231, in load_ccache_data
[Fri Feb 19 07:41:08.101259 2016] [:error] [pid 76505] [remote 10.0.1.177:40]     src = open(name)
[Fri Feb 19 07:41:08.101294 2016] [:error] [pid 76505] [remote 10.0.1.177:40] IOError: [Errno 2] No such file or directory: '/var/run/httpd/ipa/clientcaches/admin@UOFMT1'
[Fri Feb 19 07:41:09.788839 2016] [auth_gssapi:error] [pid 75336] [client 10.0.1.177:42610] failed to store delegated creds: [Unspecified GSS failure.  Minor code may provide more information (Internal credentials cache error)], referer: https://mork.cc.umanitoba.ca/ipa/xml
[Fri Feb 19 07:41:09.788844 2016] [auth_gssapi:error] [pid 78642] [client 10.0.1.177:42621] failed to store delegated creds: [Unspecified GSS failure.  Minor code may provide more information (Internal credentials cache error)], referer: https://mork.cc.umanitoba.ca/ipa/xml
[Fri Feb 19 07:41:09.788961 2016] [auth_gssapi:error] [pid 78643] [client 10.0.1.177:42613] failed to store delegated creds: [Unspecified GSS failure.  Minor code may provide more information (Internal credentials cache error)], referer: https://mork.cc.umanitoba.ca/ipa/xml
[Fri Feb 19 07:41:09.789154 2016] [auth_gssapi:error] [pid 77367] [client 10.0.1.177:42615] KRB5CCNAME file (/var/run/httpd/ipa/clientcaches/admin@UOFMT1) lookup failed!, referer: https://mork.cc.umanitoba.ca/ipa/xml

Root cause: mod_auth_gssapi supplies ccache at /var/run/httpd/ipa/clientcaches/$USER@$REALM so that all request of the same user uses the same ccache. This is usually fine but IPA session mechanism clears this file at the end of each request. Therefore one request can clear ccache of a different one and it may lead to errors mentioned in this ticket.

Logging with user+password doesn't suffer from this issue.

thorgi on IRC reported that switching to mod_auth_kerb resolved the problem for him as a workaround.

@pvoborni. Could you add a link to your patch?

The bug is causing issues with certmonger, too. I had to restart certmonger to fix it.

Request ID '20160415131851':
        status: CA_UNREACHABLE
        ca-error: Server at https://master.ipa.example/ipa/xml failed request, will retry: -504 (HTTP response code is 500, not 200).



[Fri Apr 15 13:17:19.688045 2016] [wsgi:error] [pid 7287] ipa: INFO: [xmlserver] host/idp.ipa.example@IPA.EXAMPLE: cert_request(u'MIIDaDCCAlACAQAwMDEUMBIGA1UEChMLSVBBLkVYQU1QTEUxGDAWBgNVBAMTD2lkcC5pcGEuZXhhbXBsZTCCAS
IwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANsKZ5uFHgQfKY03bJlDasUPEm53GkuNYihhFdRvJgQIBb9gKqrH/dO0oymZVipRkp6XB0VW1+SSx2ZXTrfyB4Zl6TEcRm592y0KFB4tfpziTaAxDpGbi3CoTos3QVp4jO0XiqQNhC616a1x7ecNfCFymmHIbh64NQQ1XCO9//c/pOb4Qp
gXgw6h/OhInDzkPvghwXFI4e5+TywXAgV2Fai8Td7RkS1QE3EkErSOnQ3qDsqNmtROZmr6gNIlyavhyGPv5PmYNt0SIaaJ+Pq/UO80mr9bADX3ES87DImn3ZVV7Ukt54EJrfKPlo6O48r+vqneSD1jlfOrryhFyb4GB0ECAwEAAaCB8jArBgkqhkiG9w0BCRQxHh4cAEwAbwBjAGEAbAAgAE
kAUABBACAAaABvAHMAdDCBwgYJKoZIhvcNAQkOMYG0MIGxMH8GA1UdEQEBAAR1MHOgMAYKKwYBBAGCNxQCA6AiDCBob3N0L2lkcC5pcGEuZXhhbXBsZUBJUEEuRVhBTVBMRaA/BgYrBgEFAgKgNTAzoA0bC0lQQS5FWEFNUExFoSIwIKADAgEBoRkwFxsEaG9zdBsPaWRwLmlwYS5leGFtcG
xlMAwGA1UdEwEB/wQCMAAwIAYDVR0OAQEABBYEFPCrtGkxTuL0ZNm6IKeWIxLUioNqMA0GCSqGSIb3DQEBCwUAA4IBAQB0RFtd8i9VlyvBsBIrYghd4cpL0AUy4cIL9iretMFC/JjDfGLr2Janpn1hNlmJRZOaverzT2zX+Hdxf0CUmM0wgrGVc6Yx8zajLwdOV4GEd4HC7vhCqIiV5b2BH8
GCaaKioL8KBMwT4ttYvojeTQf40TBuVYMF6iviRUCwvxl5DXmAMtM1zFw5eFizzUDn/gzOrVEUH0ZA/JPtij5gTUFQ0PH1LXvQ94FAL2q4WQyeBMvl5j5NViw1JyXrpFx79eYi+LWdMiL3uksuMWDPEPK/XGbe0rQ7Nf4YPk+cVGB0AlzH8RhhFIaR8kr23A93MHZw7tzH53nJSo370j2+Fg
Z+', principal=u'host/idp.ipa.example@IPA.EXAMPLE', add=True, version=u'2.51'): SUCCESS
[Fri Apr 15 13:17:19.688762 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236] mod_wsgi (pid=7287): Exception occurred processing WSGI script '/usr/share/ipa/wsgi.py'.
[Fri Apr 15 13:17:19.688784 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236] Traceback (most recent call last):
[Fri Apr 15 13:17:19.688795 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]   File "/usr/share/ipa/wsgi.py", line 63, in application
[Fri Apr 15 13:17:19.688842 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]     return api.Backend.wsgi_dispatch(environ, start_response)
[Fri Apr 15 13:17:19.688849 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 261, in __call__
[Fri Apr 15 13:17:19.700882 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]     return self.route(environ, start_response)
[Fri Apr 15 13:17:19.700936 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 273, in route
[Fri Apr 15 13:17:19.700966 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]     return app(environ, start_response)
[Fri Apr 15 13:17:19.700971 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 650, in __call__
[Fri Apr 15 13:17:19.700980 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]     'xmlserver', user_ccache, environ, start_response, headers)
[Fri Apr 15 13:17:19.700985 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]   File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 596, in finalize_kerberos_acquisition
[Fri Apr 15 13:17:19.700991 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]     session_data['ccache_data'] = load_ccache_data(ccache_name)
[Fri Apr 15 13:17:19.700996 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]   File "/usr/lib/python2.7/site-packages/ipalib/session.py", line 1233, in load_ccache_data
[Fri Apr 15 13:17:19.712382 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236]     src = open(name)
[Fri Apr 15 13:17:19.712413 2016] [wsgi:error] [pid 7287] [remote 192.168.121.71:236] IOError: [Errno 2] No such file or directory: '/var/run/httpd/ipa/clientcaches/host~idp.ipa.example@IPA.EXAMPLE'

I think that this bug can be worked-around by setting WSGIDaemonProcess processes=1 in /etc/httpd/conf.d/ipa.conf. I did not test it so no guarantees :-)

@pvoborni Do you know when the fix might be released?

Current approach in a patch introduces other issue - sometimes the ccaches are not deleted. I.e. we don't have a final patch and therefore the release date is unknown.

@nw0428 what releases/platforms are you interested in?

Btw, a build of mod_auth_gssapi with the not-perfect patch is available at https://copr.fedorainfracloud.org/coprs/pvoborni/freeipa-4-3/build/168653/ (only for testing purposes of course)

The workaround suggested by pspacek in comment 16 doesn't work. Switching to mod_auth_kerb works:

  1. install mod_auth_kerb
    dnf install mod_auth_kerb
  2. modify ipa.conf (note: might be overridden on next ipa update)

replace:

<Location "/ipa">
  AuthType GSSAPI
  AuthName "Kerberos Login"
  GssapiCredStore keytab:/etc/httpd/conf/ipa.keytab
  GssapiCredStore client_keytab:/etc/httpd/conf/ipa.keytab
  GssapiDelegCcacheDir /var/run/httpd/ipa/clientcaches
  GssapiUseS4U2Proxy on
  GssapiAllowedMech krb5
  Require valid-user
  ErrorDocument 401 /ipa/errors/unauthorized.html
  WSGIProcessGroup ipa
  WSGIApplicationGroup ipa
</Location>

with:

KrbConstrainedDelegationLock ipa
<Location "/ipa">
  AuthType Kerberos 
  AuthName "Kerberos Login"
  KrbMethodNegotiate on
  KrbMethodK5Passwd off
  KrbServiceName HTTP
  KrbAuthRealms MY.REALM.TEST               <--- replace with your REALM
  Krb5KeyTab /etc/httpd/conf/ipa.keytab
  KrbSaveCredentials on
  KrbConstrainedDelegation on
  Require valid-user
  ErrorDocument 401 /ipa/errors/unauthorized.html
  WSGIProcessGroup ipa
  WSGIApplicationGroup ipa
</Location>
  1. restart httpd service
    systemctl restart httpd.service

reproducer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash

echo '{"method":"user_find","params":[[],{"sizelimit":0}]}' > /tmp/js_command

IPA_SERVER=$(hostname)

# Make two parallel requests to FreeIPA API with no sessions

curl -H "Content-Type:application/json" \
    -H "Referer: https://${IPA_SERVER}/ipa/xml" \
    -H "Accept:application/json"  \
    -H "Accept-Language:en" \
    --cacert /etc/ipa/ca.crt \
    -d  @/tmp/js_command \
    -X POST \
    -b /tmp/ipauicookie1 \
    -c /tmp/ipauicookie1 \
    --negotiate \
    -u : \
    https://${IPA_SERVER}/ipa/json &

curl -H "Content-Type:application/json" \
    -H "Referer: https://${IPA_SERVER}/ipa/xml" \
    -H "Accept:application/json"  \
    -H "Accept-Language:en" \
    --cacert /etc/ipa/ca.crt \
    -d  @/tmp/js_command \
    -X POST \
    -b /tmp/ipauicookie2 \
    -c /tmp/ipauicookie2 \
    --negotiate \
    -u : \
    https://${IPA_SERVER}/ipa/json &

# wait for both requests to return
wait

new patch posted for review, which uses new GssapiDelegCcacheUnique option of mod_auth_gssapi: https://github.com/modauthgssapi/mod_auth_gssapi

It is available in mod_auth_gssapi >= 1.4.0, in F23, F24, F25 updates-testing.

master:

  • fd840a9 mod_auth_gssapi: enable unique credential caches names

ipa-4-3:

  • 59c1e69 mod_auth_gssapi: enable unique credential caches names

According bugzilla, this ticket is in wrong milestone, moving to 4.2.5

ipa-4-2:

  • 6683442 mod_auth_gssapi: enable unique credential caches names

Metadata Update from @pspacek:
- Issue assigned to pvoborni
- Issue set to the milestone: FreeIPA 4.2.5

3 years ago

Login to comment on this ticket.

Metadata