#1802 [abrt] sssd-1.9.3-1.fc18: talloc_abort: Process /usr/libexec/sssd/sssd_be was killed by signal 6 (SIGABRT)
Closed: Fixed None Opened 7 years ago by jhrozek.

https://bugzilla.redhat.com/show_bug.cgi?id=908759 (Fedora)

Description of problem:
Ran ipa-client-install to join an IPA domain. 10 minutes later sssd crash
notification came up.

Version-Release number of selected component:

Additional info:
backtrace_rating: 4
cmdline:        /usr/libexec/sssd/sssd_be --domain ipa.thewalter.lan
crash_function: talloc_abort
executable:     /usr/libexec/sssd/sssd_be
kernel:         3.6.6-3.fc18.x86_64
remote_result:  NOTFOUND
uid:            0
var_log_messages: Feb  7 13:31:37 stef-rawhide-thewalter-lan abrt[9689]: Saved
core dump of pid 8282 (/usr/libexec/sssd/sssd_be) to
/var/spool/abrt/ccpp-2013-02-07-13:31:36-8282 (18923520 bytes)

Truncated backtrace:
Thread no. 1 (10 frames)
 #2 talloc_abort at ../talloc.c:317
 #3 talloc_abort_access_after_free at ../talloc.c:336
 #4 talloc_chunk_from_ptr at ../talloc.c:357
 #6 talloc_get_name at ../talloc.c:1153
 #7 talloc_check_name at ../talloc.c:1172
 #8 ipa_dyndns_child_handler at src/providers/ipa/ipa_dyndns.c:1173
 #9 child_invoke_callback at src/util/child_common.c:578
 #10 tevent_common_loop_immediate at ../tevent_immediate.c:135
 #11 std_event_loop_once at ../tevent_standard.c:556
 #12 _tevent_loop_once at ../tevent.c:507

This might shed some light:

#2  0x00007fe2a544a2e6 in talloc_abort (reason=0x7fe2a5450718 "Bad talloc magic value - access after free") at ../talloc.c:317
No locals.

A use after free should be visible in valgrind during normal operation. Please also investigate in the corefile (based on the tevent_req return value perhaps) if the event completed successfully or after a timeout perhaps.

blockedby: =>
blocking: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
selected: =>
testsupdated: => 0

Putting to 1.9.5 for more investigation. We don't have logs so we should try to find the issue in code.

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.9.5

The problem here is that fork_nsupdate_send request was finished before nsupdate exited. Thus when SIGCHLD is received and ipa_dyndns_child_handler() tries to retrieve private date as struct tevent_req, it tries to access a request that was already freed.

There are only two scenarios when this can happen:
1. We reach IPA_DYNDNS_TIMEOUT (15 seconds) which calls tevent_req_error(req, ETIMEDOUT).
2. We fail to write date to pipe, then in ipa_dyndns_stdin_done() we get ret != EOK from write_pipe_recv() and we call tevent_req_error(req, ret).

=> callback is called, request is freed, but SIGCHLD handler still awaits the signal. When the handler is fired, we access already freed data which causes sssd_be to crash.

Possible solutions:
1. Do not call tevent_req_error() and tevent_req_done() outside SIGCHLD handler. However, this would make the timeout useless.
2. Provide a way to remove SIGCHLD handler and remove the handler before we mark the request as finished.

Fields changed

owner: somebody => lslebodn

Will be fixed along with the AD dyndns enhancement.

milestone: SSSD 1.9.5 => SSSD 1.10 beta
owner: lslebodn => jhrozek
review: => 0

Fields changed

patch: 0 => 1

This access-after-free was fixed as a byproduct of 9cb46bc

resolution: => fixed
status: new => closed

Metadata Update from @jhrozek:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.10 beta

3 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/2844

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.