#4089 Watchdog implementation or usage is incorrect
Closed: Fixed a year ago by mzidek. Opened 2 years ago by atikhonov.

When watchdog detects that process is locked it handles situation this way:

        if (getpid() == getpgrp()) {
            kill(-getpgrp(), SIGTERM);
        } else {

-- so in case of parent process (vast majority of cases) it sends SIGTERM to the whole group of processes, including itself.
The problem is this signal is handled by libtevent handler:

server_setup()->tevent_add_signal(event_ctx, event_ctx, SIGTERM, 0, default_quit, ctx);

(and also by be_process_install_sigterm_handler() but this should go away - see #4088)
And this handler just adds event for processing into tevent mainloop. But if tevent mainloop is stuck (exactly the case that triggers WD) then this event will not be processed and this makes watchdog useless.

This behavior can actually be seen in https://bugzilla.redhat.com/show_bug.cgi?id=1625937 where be-process is only terminated after very-long-running sync task is actually completed.

I propose to amend watchdog_handler() (and corresponding part of watchdog_detect_timeshift()) to always do _exit() (after optionally sending a signal to the group):

        if (getpid() == getpgrp()) {
            kill(-getpgrp(), SIGTERM);      <-- or may be `kill(0, SIGTERM);`

Metadata Update from @atikhonov:
- Issue assigned to atikhonov

2 years ago

Metadata Update from @atikhonov:
- Issue tagged with: PR

a year ago

Metadata Update from @atikhonov:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1785193

a year ago
  • sssd-1-16
    • 0c62066 - util/watchdog: fixed watchdog implementation

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/5053

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.