#3372 crash in sssd-kcm due to a race-condition between two concurrent requests

Created a month ago by lslebodn
Modified 4 days ago

It seems to be use after free.

sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache

valgrind error:

==15846== 1 errors in context 1 of 1:
==15846== Invalid read of size 4
==15846==    at 0x5BFB6B9: talloc_chunk_from_ptr (talloc.c:452)
==15846==    by 0x5BFB6B9: __talloc_get_name (talloc.c:1486)
==15846==    by 0x5BFB6B9: talloc_check_name (talloc.c:1509)
==15846==    by 0x120F58: kcm_op_queue_add (kcmsrv_op_queue.c:136)
==15846==    by 0x120F58: kcm_op_queue_send (kcmsrv_op_queue.c:223)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)
==15846==    by 0x110856: main (kcm.c:313)
==15846==  Address 0xdccbea0 is 544 bytes inside a block of size 773 free'd
==15846==    at 0x4C2FCC8: free (vg_replace_malloc.c:530)
==15846==    by 0x5C024B3: _tc_free_poolmem (talloc.c:1000)
==15846==    by 0x5C024B3: _tc_free_internal (talloc.c:1141)
==15846==    by 0x5BFAAA7: _tc_free_children_internal (talloc.c:1593)
==15846==    by 0x5BFAAA7: _tc_free_internal (talloc.c:1104)
==15846==    by 0x5BFAAA7: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAAA7: _talloc_free (talloc.c:1716)
==15846==    by 0x59ECFC0: tevent_req_received (tevent_req.c:255)
==15846==    by 0x59ECFD8: tevent_req_destructor (tevent_req.c:107)
==15846==    by 0x5BFAF30: _tc_free_internal (talloc.c:1078)
==15846==    by 0x5BFAF30: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAF30: _talloc_free (talloc.c:1716)
==15846==    by 0x111711: kcm_cmd_request_done (kcmsrv_cmd.c:391)
==15846==    by 0x11E846: kcm_op_get_cache_uuid_list_done (kcmsrv_ops.c:1303)
==15846==    by 0x11B1F7: ccdb_sec_list_done (kcmsrv_ccache_secrets.c:1147)
==15846==    by 0x119DB0: sec_list_done (kcmsrv_ccache_secrets.c:224)
==15846==    by 0x12495A: tcurl_request_done (tev_curl.c:746)
==15846==    by 0x12495A: handle_curlmsg_done (tev_curl.c:234)
==15846==    by 0x12495A: process_curl_activity.isra.0 (tev_curl.c:245)
==15846==    by 0x124E4B: tcurlsock_input_available (tev_curl.c:288)
==15846==  Block was alloc'd at
==15846==    at 0x4C2EB1B: malloc (vg_replace_malloc.c:299)
==15846==    by 0x5BFE03B: __talloc_with_prefix (talloc.c:698)
==15846==    by 0x5BFE03B: _talloc_pool (talloc.c:752)
==15846==    by 0x5BFE03B: _talloc_pooled_object (talloc.c:820)
==15846==    by 0x59ECCAA: _tevent_req_create (tevent_req.c:73)
==15846==    by 0x120C43: kcm_op_queue_send (kcmsrv_op_queue.c:206)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)

Part of sssd-kcm.log

(Tue Apr 18 08:30:48:243213 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data 0xdcd93f0
(Tue Apr 18 08:30:48:246461 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/1000/ccache/
(Tue Apr 18 08:30:48:247437 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Apr 18 08:30:48:249128 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Apr 18 08:30:48:249732 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Apr 18 08:30:48:253592 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items
(Tue Apr 18 08:30:48:262271 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done
(Tue Apr 18 08:30:48:263827 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Found 2 ccaches
(Tue Apr 18 08:30:48:266425 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Listing all caches done
(Tue Apr 18 08:30:48:269666 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 23cb4886-fa25-41e1-b065-327cc6f565c4
(Tue Apr 18 08:30:48:270029 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 901c7634-98ee-4e56-bd6c-84d6548d0f9f
(Tue Apr 18 08:30:48:270963 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
(Tue Apr 18 08:30:48:272724 2017) [sssd[kcm]] [kcm_op_get_default_ccache_send] (0x1000): Getting client's default ccache
(Tue Apr 18 08:30:48:273393 2017) [sssd[kcm]] [ccdb_sec_get_default_send] (0x2000): Getting the default ccache
(Tue Apr 18 08:30:48:273723 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/1000/default, at socket /var/run/secrets.socket
(Tue Apr 18 08:30:48:273828 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:274587 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply
(Tue Apr 18 08:30:48:274802 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload
(Tue Apr 18 08:30:48:280941 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data (nil)
(Tue Apr 18 08:30:48:281043 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:281121 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:281227 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 4995
(Tue Apr 18 08:30:48:281319 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:282299 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Tue Apr 18 08:30:48:282717 2017) [sssd[kcm]] [kcm_input_parse] (0x1000): Received message with length 4
(Tue Apr 18 08:30:48:282789 2017) [sssd[kcm]] [kcm_get_opt] (0x2000): The client requested operation 20
(Tue Apr 18 08:30:48:282870 2017) [sssd[kcm]] [kcm_cmd_send] (0x0400): KCM operation GET_DEFAULT_CACHE
(Tue Apr 18 08:30:48:282935 2017) [sssd[kcm]] [kcm_cmd_send] (0x1000): 0 bytes on KCM input
(Tue Apr 18 08:30:48:283032 2017) [sssd[kcm]] [kcm_op_queue_send] (0x0200): Adding request by 1000 to the wait queue
(Tue Apr 18 08:30:48:284040 2017) [sssd[kcm]] [talloc_log_fn] (0x0010): Bad talloc magic value - wrong talloc version used/mixed

I have a suspicion that it is partially related to expired keys.

sh]$ klist -l
Principal name                 Cache name
--------------                 ----------
lslebodn@EXAMPLE.COM           KCM:1000 (Expired)
lslebodn@ANOTHERREALM.ORG      KCM:1000:85269 (Expired)

sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache

sh$ kinit lslebodn@EXAMPLE.COM
Password for lslebodn@EXAMPLE.COM:

I have a different theory. I think it's a race condition between the request that just finished, calls tevent_req_done which frees the queue entry and between the code that first fetches the queue entry from the hash table and then checks it. I think the entry is first returned from the hash table, then the request is freed and then the queue code checks the entry.

I doubt there was any other request. It is still possible.

But I haven't had a time to find reasonable reproducer.

You were probably right. Reasonable reproducer is to kinit from two different terminals with different realm at the same time.

1st terminal:

sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache
sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache
sh$ kinit lslebodn@EXAMPLE.COM
Password for lslebodn@EXAMPLE.COM: 
kinit: Pre-authentication failed: Password read interrupted while getting initial credentials
sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while searching for ccache for lslebodn@EXAMPLE.COM
sh$ kinit lslebodn@EXAMPLE.COM
Password for lslebodn@EXAMPLE.COM:

2nd terminal

sh$ kinit lslebodn@ANOTHER.REALM
sh$ kinit lslebodn@ANOTHER.REALM
Password for lslebodn@ANOTHER.REALM: 
kinit: Pre-authentication failed: Password read interrupted while getting initial credentials
sh$ kinit lslebodn@ANOTHER.REALM
Password for lslebodn@ANOTHER.REALM: 
kinit: Pre-authentication failed: Password read interrupted while getting initial credentials
sh$ kinit lslebodn@ANOTHER.REALM
kinit: Credentials cache I/O operation failed while getting default ccache
sh$ kinit lslebodn@ANOTHER.REALM
kinit: Credentials cache I/O operation failed while getting default ccache

I might reproduce it more often because I run sssd-secrets and sssd-kcm with valgrind and therefore they are slower. But I am still not sure why it can occur even with single kinit operation; as it is described in ticket.

And it needn't be even a different realm. You can just renew the same ticket twice in parallel as a reasonable reproducer. Just ensure that ticker is not Expired :-)

e.g. for i in {1..2} ; do kinit -R & done

a month ago

Metadata Update from @jhrozek:
- Issue priority set to: critical
- Issue set to the milestone: SSSD 1.15.3

a month ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1446302

a month ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1446302

Issue linked to Bugzilla: Bug 1446302

5 days ago

Metadata Update from @jhrozek:
- Issue assigned to jhrozek

5 days ago

Metadata Update from @jhrozek:
- Custom field patch adjusted to on

master:

  • fb51bb68e62de7bb8542f5d224994eb7143040a6
  • 274489b092bba5fc81cb0f803843d56b267c5aaf
Edited 4 days ago by lslebodn
4 days ago

Metadata Update from @jhrozek:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 days ago

Metadata Update from @lslebodn:
- Custom field version adjusted to 1.1.5.3

Login to comment on this ticket.

1.1.5.3

on

https://bugzilla.redhat.com/show_bug.cgi?id=1446302

cancel