#2660 SSSD service is crashing: dbus_watch_handle() is invoked with corrupted 'watch' value
Closed: Fixed a year ago by pbrezina. Opened 5 years ago by plusky.

When using sssd on openSUSE, and system is under heavy load, dbus connections sometimes fail. It seems to trigger following segmentation fault:

(gdb) bt
#0  dbus_watch_handle (watch=0xa0, flags=flags@entry=2) at dbus-watch.c:704
#1  0x00007f70b56e6f48 in sbus_watch_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, data=<optimized out>) at src/sbus/sssd_dbus_common.c:94
#2  0x00007f70b4a1288b in epoll_event_loop (tvalp=0x7ffe2a238690, epoll_ev=0x7f70b5f8d360) at ../tevent_epoll.c:728
#3  epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:926
#4  0x00007f70b4a10d57 in std_event_loop_once (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:114
#5  0x00007f70b4a0d51d in _tevent_loop_once (ev=ev@entry=0x7f70b5f8d120, location=location@entry=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:533
#6  0x00007f70b4a0d6bb in tevent_common_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:637
#7  0x00007f70b4a10cf7 in std_event_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:140
#8  0x00007f70b56eeec3 in server_loop (main_ctx=0x7f70b5f8e570) at src/util/server.c:668
#9  0x00007f70b5d453e9 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:610

In this case the backtrace and corefile are not that useful, because we just see the function we ended at.

What would be much more useful is either a way to reproduce the bug locally, or at least full verbose debug logs. See https://fedorahosted.org/sssd/wiki/Troubleshooting

content of /var/log/sssd with enabled debug=1
sssd_logs.tar.xz

It seems that issue was triggered by heavy io, as same segfault occurred around 9 am. I will try to reproduce issue with increased debug level.

Fields changed description: When using sssd on openSUSE, and system is under heavy load, dbus connections sometimes fail. It seems to trigger following segmentation fault: (gdb) bt #0 dbus_watch_handle (watch=0xa0, flags=flags@entry=2) at dbus-watch.c:704 #1 0x00007f70b56e6f48 in sbus_watch_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, data=<optimized out>) at src/sbus/sssd_dbus_common.c:94 #2 0x00007f70b4a1288b in epoll_event_loop (tvalp=0x7ffe2a238690, epoll_ev=0x7f70b5f8d360) at ../tevent_epoll.c:728 #3 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:926 #4 0x00007f70b4a10d57 in std_event_loop_once (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:114 #5 0x00007f70b4a0d51d in _tevent_loop_once (ev=ev@entry=0x7f70b5f8d120, location=location@entry=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:533 #6 0x00007f70b4a0d6bb in tevent_common_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:637 #7 0x00007f70b4a10cf7 in std_event_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:140 #8 0x00007f70b56eeec3 in server_loop (main_ctx=0x7f70b5f8e570) at src/util/server.c:668 #9 0x00007f70b5d453e9 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:610 => When using sssd on openSUSE, and system is under heavy load, dbus connections sometimes fail. It seems to trigger following segmentation fault: {{{ (gdb) bt #0 dbus_watch_handle (watch=0xa0, flags=flags@entry=2) at dbus-watch.c:704 #1 0x00007f70b56e6f48 in sbus_watch_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, data=<optimized out>) at src/sbus/sssd_dbus_common.c:94 #2 0x00007f70b4a1288b in epoll_event_loop (tvalp=0x7ffe2a238690, epoll_ev=0x7f70b5f8d360) at ../tevent_epoll.c:728 #3 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:926 #4 0x00007f70b4a10d57 in std_event_loop_once (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:114 #5 0x00007f70b4a0d51d in _tevent_loop_once (ev=ev@entry=0x7f70b5f8d120, location=location@entry=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:533 #6 0x00007f70b4a0d6bb in tevent_common_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent.c:637 #7 0x00007f70b4a10cf7 in std_event_loop_wait (ev=0x7f70b5f8d120, location=0x7f70b5705b25 "src/util/server.c:668") at ../tevent_standard.c:140 #8 0x00007f70b56eeec3 in server_loop (main_ctx=0x7f70b5f8e570) at src/util/server.c:668 #9 0x00007f70b5d453e9 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:610 }}}

I saw such crash few times (IIRC it's use after free)
but I was not able to reproduce it.

It would be good to have a reasonable reproducer.

cc: => lslebodn@redhat.com

Hi,
any luck getting the reproducer or the logs?

sensitive: => 0

Replying to [comment:6 jhrozek]:

Hi,
any luck getting the reproducer or the logs?
Unfortunately not yet, but I keep trying.

Thank you for opening the ticket. I'll close it for now as we don't have reproducer or logs.

Feel free to reopen and attach more information. Thank you!

resolution: => worksforme
status: new => closed

Metadata Update from @plusky:
- Issue set to the milestone: NEEDS_TRIAGE

4 years ago

Metadata Update from @atikhonov:
- Custom field design_review adjusted to on (was: 0)
- Custom field mark adjusted to on (was: 0)
- Custom field patch adjusted to on (was: 0)
- Custom field review adjusted to on (was: 0)
- Custom field sensitive adjusted to on (was: 0)
- Custom field testsupdated adjusted to on (was: 0)
- Issue status updated to: Open (was: Closed)

a year ago

I believe this is the same issue as:
* https://bugzilla.redhat.com/show_bug.cgi?id=1144765
* https://bugzilla.redhat.com/show_bug.cgi?id=1731577
* https://bugzilla.redhat.com/show_bug.cgi?id=1767159

My guess is following is happening:
- libdbus removes dbush watch calling sbus_remove_watch() callback
- watch is freed, but watch->fde() (tevent fd monitoring; set up in sbus_add_watch()) is not removed ( https://tevent.samba.org/group__tevent.html#gadc52787f3daf49e589066d37a5cdb18c : "To cancel the monitoring of a file descriptor, call talloc_free() on the object returned by this function.")
- tevent calls sbus_watch_handler() to report fd became writable, but watch context is already freed => use-after-free

Indirect confirmation is that sssd 2-x has this fixed: https://github.com/SSSD/sssd/blob/master/src/sbus/connection/sbus_watch.c#L283
(though I would prefer to put this into watch destructor)

Metadata Update from @atikhonov:
- Custom field design_review reset (from on)
- Custom field mark reset (from on)
- Custom field review reset (from on)
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1767159
- Custom field version adjusted to 1.16.x (was: 1.12.4)
- Issue priority set to: major (was: minor)
- Issue tagged with: bug

a year ago

Metadata Update from @atikhonov:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field review reset (from false)

a year ago

Metadata Update from @atikhonov:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field review reset (from false)
- Custom field testsupdated reset (from on)
- Issue tagged with: PR

a year ago

PR: https://github.com/SSSD/sssd/pull/930

(1) I was wrong in regards of how this "use-after-free" happens.

#3  0x00007f5b3c93e527 in epoll_event_loop (tvalp=0x7ffe3ad15da0, epoll_ev=0x55efe0f0ec40) at ../tevent_epoll.c:736
        ...
        events = {{events = 29, data = {ptr = 0x55efe0f1ed70, ...}}}

events = 29 = 1D = 0x1 + 0x4 + 0x8 + 0x10 = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP

It seems actual flow was:
tevent_common_invoke_fd_handler() -> sbus_watch_handler() -> dbus_watch_handle(R) -> ...libdbus detects connection is closed... -> sbus_remove_watch() -> talloc_free(watch) ... get back to libdbus and back to sbus_watch_handler() -> "if (watch->dbus_write_watch) dbus_watch_handle(W)" => use-after-free

I will prepare new patch to fix this.

(2) It seems there is additional issue: call to talloc_free(watch) in the context of tevent_common_invoke_fd_handler() will make d-tor of attached tevent_fd to fail:
https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L124
and
https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L52
thus leaking memory. Fixing this leak is tricky because libtevent requires to destroy tevent_fd before close(fd), which most probably will be executed by libdbus right after sbus_remove_watch()...
But I have to check details with maintainer/knowledgeable person of libtevent.

Metadata Update from @atikhonov:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field review reset (from false)
- Custom field testsupdated reset (from false)

a year ago

Metadata Update from @atikhonov:
- Issue assigned to atikhonov

a year ago

Metadata Update from @atikhonov:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field review reset (from false)
- Custom field testsupdated reset (from false)

a year ago
  • master
    • 337a1ad - SBUS: defer deallocation of sbus_watch_ctx
  • sssd-1-16
    • f845355 - SBUS: defer deallocation of sbus_watch_ctx

Metadata Update from @pbrezina:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field review reset (from false)
- Custom field testsupdated reset (from false)
- Issue status updated to: Open (was: Closed)

a year ago

Metadata Update from @pbrezina:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field review reset (from false)
- Custom field testsupdated reset (from false)
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

a year ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/3701

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata