Learn more about these different git repos.
Other Git URLs
Forwarded from the Debian BTS:
In a setup with sssd using a remote slapd for NSS, and a somewhat flaky network in between, sssd_be tends to get into a busy loop sometimes, using 100% CPU time on one core.
Debugging showed that sssd has a watchdog to clean up in such cases, but sssd_be installs a signal handler that prevents the SIGTERM on the processgroup to be processed correctly, and does not exit.
src/util/util_watchdog.c:
64 /* the watchdog is purposefully *not* handled by the tevent 65 * signal handler as it is meant to check if the daemon is 66 * still processing the event queue itself. A stuck process 67 * may not handle the event queue at all and thus not handle 68 * signals either */ 69 static void watchdog_handler(int sig) 70 { 71 72 watchdog_detect_timeshift(); 73 74 /* if a pre-defined number of ticks passed by kills itself */ 75 if (__sync_add_and_fetch(&watchdog_ctx.ticks, 1) > +WATCHDOG_MAX_TICKS) { 76 if (getpid() == getpgrp()) { 77 kill(-getpgrp(), SIGTERM); 78 } else { 79 _exit(1); 80 } 81 } 82 }
(NB. Seems what is described in the comment was not all too successful ;)
The signal handler is installed in src/providers/data_provider_be.c:
448 static void be_process_finalize(struct tevent_context *ev, 449 struct tevent_signal *se, 450 int signum, 451 int count, 452 void *siginfo, 453 void *private_data) 454 { 455 struct be_ctx *be_ctx; 456 457 be_ctx = talloc_get_type(private_data, struct be_ctx); 458 talloc_free(be_ctx); 459 orderly_shutdown(0); 460 } 461 462 static errno_t be_process_install_sigterm_handler(struct be_ctx *be_ctx) 463 { 464 struct tevent_signal *sige; 465 466 BlockSignals(false, SIGTERM); 467 468 sige = tevent_add_signal(be_ctx->ev, be_ctx, SIGTERM, SA_SIGINFO, 469 be_process_finalize, be_ctx); 470 if (sige == NULL) { 471 DEBUG(SSSDBG_CRIT_FAILURE, "tevent_add_signal failed.\n"); 472 return ENOMEM; 473 } 474 475 return EOK; 476 }
Setting a breakpoint on be_process_finalize showed that this function is never reached, probably because libtevent never gets around to calling it.
Two proposals to circumvent this are: a) Reset the handler before calling kill on the process group in line 77 (e.g. signal(SIGTERM, SIG_DFL);) b) Move the exit call in line 79 out of the branch so it gets called unconditionally in case kill() fails to kill the process itself
We tested solution a) in gdb and it caused sssd_be to exit cleanly and restart, as it should.
Hi @natureshadow,
could you please check if recently merged https://github.com/SSSD/sssd/pull/964 fixes your issue?
This is actually item (b) of your proposal.
Indeed seems like this is a duplicate of #4089.
Metadata Update from @natureshadow: - Issue close_status updated to: duplicate - Issue status updated to: Closed (was: Open)
Hi @atikhonov,
it’s not as if the issue is reliably triggered. It just happens that sometimes sssd appears to stop working and uses a full CPU core; with the fix, it would just exit properly.
I assume the fix is correct, and the package maintainers will want to backport it.
Thanks!
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/5093
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.