Issue #1802: [abrt] sssd-1.9.3-1.fc18: talloc_abort: Process /usr/libexec/sssd/sssd_be was killed by signal 6 (SIGABRT) - sssd

SSSD / sssd

#1802 [abrt] sssd-1.9.3-1.fc18: talloc_abort: Process /usr/libexec/sssd/sssd_be was killed by signal 6 (SIGABRT)

Closed: Fixed None Opened 11 years ago by jhrozek.

https://bugzilla.redhat.com/show_bug.cgi?id=908759 (Fedora)

Description of problem:
Ran ipa-client-install to join an IPA domain. 10 minutes later sssd crash
notification came up.

Version-Release number of selected component:
sssd-1.9.3-1.fc18

Additional info:
backtrace_rating: 4
cmdline:        /usr/libexec/sssd/sssd_be --domain ipa.thewalter.lan
--debug-to-files
crash_function: talloc_abort
executable:     /usr/libexec/sssd/sssd_be
kernel:         3.6.6-3.fc18.x86_64
remote_result:  NOTFOUND
uid:            0
var_log_messages: Feb  7 13:31:37 stef-rawhide-thewalter-lan abrt[9689]: Saved
core dump of pid 8282 (/usr/libexec/sssd/sssd_be) to
/var/spool/abrt/ccpp-2013-02-07-13:31:36-8282 (18923520 bytes)

Truncated backtrace:
Thread no. 1 (10 frames)
 #2 talloc_abort at ../talloc.c:317
 #3 talloc_abort_access_after_free at ../talloc.c:336
 #4 talloc_chunk_from_ptr at ../talloc.c:357
 #6 talloc_get_name at ../talloc.c:1153
 #7 talloc_check_name at ../talloc.c:1172
 #8 ipa_dyndns_child_handler at src/providers/ipa/ipa_dyndns.c:1173
 #9 child_invoke_callback at src/util/child_common.c:578
 #10 tevent_common_loop_immediate at ../tevent_immediate.c:135
 #11 std_event_loop_once at ../tevent_standard.c:556
 #12 _tevent_loop_once at ../tevent.c:507

jhrozek commented 11 years ago

This might shed some light:

#2  0x00007fe2a544a2e6 in talloc_abort (reason=0x7fe2a5450718 "Bad talloc magic value - access after free") at ../talloc.c:317
No locals.

A use after free should be visible in valgrind during normal operation. Please also investigate in the corefile (based on the tevent_req return value perhaps) if the event completed successfully or after a timeout perhaps.

blockedby: =>
blocking: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
selected: =>
testsupdated: => 0

jhrozek commented 11 years ago

Putting to 1.9.5 for more investigation. We don't have logs so we should try to find the issue in code.

jhrozek commented 11 years ago

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.9.5

pbrezina commented 11 years ago

The problem here is that fork_nsupdate_send request was finished before nsupdate exited. Thus when SIGCHLD is received and ipa_dyndns_child_handler() tries to retrieve private date as struct tevent_req, it tries to access a request that was already freed.

There are only two scenarios when this can happen:
1. We reach IPA_DYNDNS_TIMEOUT (15 seconds) which calls tevent_req_error(req, ETIMEDOUT).
2. We fail to write date to pipe, then in ipa_dyndns_stdin_done() we get ret != EOK from write_pipe_recv() and we call tevent_req_error(req, ret).

=> callback is called, request is freed, but SIGCHLD handler still awaits the signal. When the handler is fired, we access already freed data which causes sssd_be to crash.

Possible solutions:
1. Do not call tevent_req_error() and tevent_req_done() outside SIGCHLD handler. However, this would make the timeout useless.
2. Provide a way to remove SIGCHLD handler and remove the handler before we mark the request as finished.

jhrozek commented 11 years ago

Fields changed

owner: somebody => lslebodn

jhrozek commented 11 years ago

Will be fixed along with the AD dyndns enhancement.

milestone: SSSD 1.9.5 => SSSD 1.10 beta
owner: lslebodn => jhrozek
review: => 0

jhrozek commented 11 years ago

Fields changed

patch: 0 => 1

jhrozek commented 11 years ago

This access-after-free was fixed as a byproduct of 9cb46bc

resolution: => fixed
status: new => closed

Metadata Update from @jhrozek:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.10 beta

7 years ago

pbrezina commented 4 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/2844

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata

Assignee

jhrozek

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

SSSD 1.10 beta

type

defect

component

SSSD

version

None

selected

None

testsupdated

patch

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=908759

design_review

review

changelog

None

keywords

None

coverity

None

mark

None

blocking

None

design

None

sensitive

None

blockedby

None

feature_milestone

None

SSSD / sssd

Source Code

Documentation

#1802 [abrt] sssd-1.9.3-1.fc18: talloc_abort: Process /usr/libexec/sssd/sssd_be was killed by signal 6 (SIGABRT) Closed: Fixed None Opened 11 years ago by jhrozek.

Metadata

#1802 [abrt] sssd-1.9.3-1.fc18: talloc_abort: Process /usr/libexec/sssd/sssd_be was killed by signal 6 (SIGABRT)

Closed: Fixed None Opened 11 years ago by jhrozek.