Learn more about these different git repos.
Other Git URLs
When using sssd on openSUSE, and system is under heavy load, dbus connections sometimes fail. It seems to trigger following segmentation fault:
(gdb) bt #0 dbus_watch_handle (watch=0xa0, flags=flags@entry=2) at dbus-watch.c:704 #1 0x00007f70b56e6f48 in sbus_watch_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, data=<optimized out>) at src/sbus/sssd_dbus_common.c:94 #2 0x00007f70b4a1288b in epoll_event_loop (tvalp=0x7ffe2a238690, epoll_ev=0x7f70b5f8d360) at ../tevent_epoll.c:728 #3 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:926 #4 0x00007f70b4a10d57 in std_event_loop_once (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:114 #5 0x00007f70b4a0d51d in _tevent_loop_once (ev=ev@entry=0x7f70b5f8d120, location=location@entry=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:533 #6 0x00007f70b4a0d6bb in tevent_common_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:637 #7 0x00007f70b4a10cf7 in std_event_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:140 #8 0x00007f70b56eeec3 in server_loop (main_ctx=0x7f70b5f8e570) at src/util/server.c:668 #9 0x00007f70b5d453e9 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:610
full backtrace sssd_bt.log
core file core.sssd_nss.0.62b5a145b83e4cdcbb27378f19ff38d4.3590.1432783210000000.xz
In this case the backtrace and corefile are not that useful, because we just see the function we ended at.
What would be much more useful is either a way to reproduce the bug locally, or at least full verbose debug logs. See https://fedorahosted.org/sssd/wiki/Troubleshooting
Also https://fedorahosted.org/sssd/wiki/Reporting_sssd_bugs might be useful
content of /var/log/sssd with enabled debug=1 sssd_logs.tar.xz
sssd configuration file sssd.conf
It seems that issue was triggered by heavy io, as same segfault occurred around 9 am. I will try to reproduce issue with increased debug level.
I saw such crash few times (IIRC it's use after free) but I was not able to reproduce it.
It would be good to have a reasonable reproducer.
cc: => lslebodn@redhat.com
Hi, any luck getting the reproducer or the logs?
sensitive: => 0
Replying to [comment:6 jhrozek]:
Hi, any luck getting the reproducer or the logs? Unfortunately not yet, but I keep trying.
Thank you for opening the ticket. I'll close it for now as we don't have reproducer or logs.
Feel free to reopen and attach more information. Thank you!
resolution: => worksforme status: new => closed
Metadata Update from @plusky: - Issue set to the milestone: NEEDS_TRIAGE
Metadata Update from @atikhonov: - Custom field design_review adjusted to on (was: 0) - Custom field mark adjusted to on (was: 0) - Custom field patch adjusted to on (was: 0) - Custom field review adjusted to on (was: 0) - Custom field sensitive adjusted to on (was: 0) - Custom field testsupdated adjusted to on (was: 0) - Issue status updated to: Open (was: Closed)
I believe this is the same issue as: * https://bugzilla.redhat.com/show_bug.cgi?id=1144765 * https://bugzilla.redhat.com/show_bug.cgi?id=1731577 * https://bugzilla.redhat.com/show_bug.cgi?id=1767159
My guess is following is happening: - libdbus removes dbush watch calling sbus_remove_watch() callback - watch is freed, but watch->fde() (tevent fd monitoring; set up in sbus_add_watch()) is not removed ( https://tevent.samba.org/group__tevent.html#gadc52787f3daf49e589066d37a5cdb18c : "To cancel the monitoring of a file descriptor, call talloc_free() on the object returned by this function.") - tevent calls sbus_watch_handler() to report fd became writable, but watch context is already freed => use-after-free
sbus_remove_watch()
watch->fde()
sbus_add_watch()
sbus_watch_handler()
Indirect confirmation is that sssd 2-x has this fixed: https://github.com/SSSD/sssd/blob/master/src/sbus/connection/sbus_watch.c#L283 (though I would prefer to put this into watch destructor)
Metadata Update from @atikhonov: - Custom field design_review reset (from on) - Custom field mark reset (from on) - Custom field review reset (from on) - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1767159 - Custom field version adjusted to 1.16.x (was: 1.12.4) - Issue priority set to: major (was: minor) - Issue tagged with: bug
PR: https://github.com/SSSD/sssd/pull/930
Metadata Update from @atikhonov: - Custom field design_review reset (from false) - Custom field mark reset (from false) - Custom field review reset (from false)
Metadata Update from @atikhonov: - Custom field design_review reset (from false) - Custom field mark reset (from false) - Custom field review reset (from false) - Custom field testsupdated reset (from on) - Issue tagged with: PR
(1) I was wrong in regards of how this "use-after-free" happens.
#3 0x00007f5b3c93e527 in epoll_event_loop (tvalp=0x7ffe3ad15da0, epoll_ev=0x55efe0f0ec40) at ../tevent_epoll.c:736 ... events = {{events = 29, data = {ptr = 0x55efe0f1ed70, ...}}}
events = 29 = 1D = 0x1 + 0x4 + 0x8 + 0x10 = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP
It seems actual flow was: tevent_common_invoke_fd_handler() -> sbus_watch_handler() -> dbus_watch_handle(R) -> ...libdbus detects connection is closed... -> sbus_remove_watch() -> talloc_free(watch) ... get back to libdbus and back to sbus_watch_handler() -> "if (watch->dbus_write_watch) dbus_watch_handle(W)" => use-after-free
tevent_common_invoke_fd_handler()
dbus_watch_handle(R)
talloc_free(watch)
I will prepare new patch to fix this.
(2) It seems there is additional issue: call to talloc_free(watch) in the context of tevent_common_invoke_fd_handler() will make d-tor of attached tevent_fd to fail: https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L124 and https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L52 thus leaking memory. Fixing this leak is tricky because libtevent requires to destroy tevent_fd before close(fd), which most probably will be executed by libdbus right after sbus_remove_watch()... But I have to check details with maintainer/knowledgeable person of libtevent.
tevent_fd
Metadata Update from @atikhonov: - Custom field design_review reset (from false) - Custom field mark reset (from false) - Custom field review reset (from false) - Custom field testsupdated reset (from false)
Metadata Update from @atikhonov: - Issue assigned to atikhonov
PRs: 1.16: https://github.com/SSSD/sssd/pull/932 2.x: https://github.com/SSSD/sssd/pull/934
Commit 337a1ad fixes this issue
master
sssd-1-16
Metadata Update from @pbrezina: - Custom field design_review reset (from false) - Custom field mark reset (from false) - Custom field review reset (from false) - Custom field testsupdated reset (from false) - Issue status updated to: Open (was: Closed)
Metadata Update from @pbrezina: - Custom field design_review reset (from false) - Custom field mark reset (from false) - Custom field review reset (from false) - Custom field testsupdated reset (from false) - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/3701
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.