#1802 [abrt] sssd-1.9.3-1.fc18: talloc_abort: Process /usr/libexec/sssd/sssd_be was killed by signal 6 (SIGABRT)
Closed: Fixed None Opened 6 years ago by jhrozek.

https://bugzilla.redhat.com/show_bug.cgi?id=908759 (Fedora)

Description of problem:
Ran ipa-client-install to join an IPA domain. 10 minutes later sssd crash
notification came up.

Version-Release number of selected component:

Additional info:
backtrace_rating: 4
cmdline:        /usr/libexec/sssd/sssd_be --domain ipa.thewalter.lan
crash_function: talloc_abort
executable:     /usr/libexec/sssd/sssd_be
kernel:         3.6.6-3.fc18.x86_64
remote_result:  NOTFOUND
uid:            0
var_log_messages: Feb  7 13:31:37 stef-rawhide-thewalter-lan abrt[9689]: Saved
core dump of pid 8282 (/usr/libexec/sssd/sssd_be) to
/var/spool/abrt/ccpp-2013-02-07-13:31:36-8282 (18923520 bytes)

Truncated backtrace:
Thread no. 1 (10 frames)
 #2 talloc_abort at ../talloc.c:317
 #3 talloc_abort_access_after_free at ../talloc.c:336
 #4 talloc_chunk_from_ptr at ../talloc.c:357
 #6 talloc_get_name at ../talloc.c:1153
 #7 talloc_check_name at ../talloc.c:1172
 #8 ipa_dyndns_child_handler at src/providers/ipa/ipa_dyndns.c:1173
 #9 child_invoke_callback at src/util/child_common.c:578
 #10 tevent_common_loop_immediate at ../tevent_immediate.c:135
 #11 std_event_loop_once at ../tevent_standard.c:556
 #12 _tevent_loop_once at ../tevent.c:507

This might shed some light:

#2  0x00007fe2a544a2e6 in talloc_abort (reason=0x7fe2a5450718 "Bad talloc magic value - access after free") at ../talloc.c:317
No locals.

A use after free should be visible in valgrind during normal operation. Please also investigate in the corefile (based on the tevent_req return value perhaps) if the event completed successfully or after a timeout perhaps.

blockedby: =>
blocking: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
selected: =>
testsupdated: => 0

Putting to 1.9.5 for more investigation. We don't have logs so we should try to find the issue in code.

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.9.5

The problem here is that fork_nsupdate_send request was finished before nsupdate exited. Thus when SIGCHLD is received and ipa_dyndns_child_handler() tries to retrieve private date as struct tevent_req, it tries to access a request that was already freed.

There are only two scenarios when this can happen:
1. We reach IPA_DYNDNS_TIMEOUT (15 seconds) which calls tevent_req_error(req, ETIMEDOUT).
2. We fail to write date to pipe, then in ipa_dyndns_stdin_done() we get ret != EOK from write_pipe_recv() and we call tevent_req_error(req, ret).

=> callback is called, request is freed, but SIGCHLD handler still awaits the signal. When the handler is fired, we access already freed data which causes sssd_be to crash.

Possible solutions:
1. Do not call tevent_req_error() and tevent_req_done() outside SIGCHLD handler. However, this would make the timeout useless.
2. Provide a way to remove SIGCHLD handler and remove the handler before we mark the request as finished.

Fields changed

owner: somebody => lslebodn

Will be fixed along with the AD dyndns enhancement.

milestone: SSSD 1.9.5 => SSSD 1.10 beta
owner: lslebodn => jhrozek
review: => 0

Fields changed

patch: 0 => 1

This access-after-free was fixed as a byproduct of 9cb46bc

resolution: => fixed
status: new => closed

Metadata Update from @jhrozek:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.10 beta

2 years ago

Login to comment on this ticket.