#2305 SSSD Crashes when storage experiences high latency
Closed: Fixed None Opened 8 years ago by gprocunier.

We are experiencing some latency issues on our SAN due to unexpected load. During this overload period all writes to the disk block, this in turn blocks our virtual machines running sssd from writing to their cache.

When this write block occurs sssd will often segfault.

The OS running sssd is RHEL 6.5

[root@sdxbin2 ~]# rpm -q sssd

[root@sdxbin2 ~]# uname -a
Linux sdxbin2.symprod.com 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

The 3 files assemble to contain a tar of the abrt capture of sssd.

This information can be useful for other developers.

sssd crashed because ldap_sasl_interactive_bind_s was called with invalid the first argument (type "LDAP *").

#0  ldap_sasl_interactive_bind_s (ld=0x0, dn=0x0, mechs=0xa9cff0 "GSSAPI", 
    serverControls=0x0, clientControls=0x0, flags=2, 
    interact=0x7f9953eadd60 <sdap_sasl_interact>, defaults=0xaacec0)
    at ../../../libraries/libldap/sasl.c:429
        rc = <value optimized out>
        smechs = 0x0

I am not really sure how could sssd got to this state.

(gdb) ptype state->sh
type = struct sdap_handle {
    LDAP *ldap;
    _Bool connected;
    time_t expire_time;
    ber_int_t page_size;
    struct sdap_fd_events *sdap_fd_events;
    struct sup_list supported_saslmechs;
    struct sup_list supported_controls;
    struct sup_list supported_extensions;
    struct sdap_op *ops;
    _Bool destructor_lock;
    _Bool release_memory;
} *
(gdb) p state->sh[0]
$10 = {
  ldap = 0x0, 
  connected = false, 
  expire_time = 1396340809, 
  page_size = 1000, 
  sdap_fd_events = 0xab6bd0, 
  supported_saslmechs = {
    num_vals = 0, 
    vals = 0x0
  supported_controls = {
    num_vals = 0, 
    vals = 0x0
  supported_extensions = {
    num_vals = 0, 
    vals = 0x0
  ops = 0x0, 
  destructor_lock = false, 
  release_memory = false

Fields changed

cc: => lslebodn@redhat.com

Interesting, I would have thought this bug was solved since 5fe6ca5 but that commit is already in the version the user runs..

I think we need to see the logs to investigate the issue, the connection management is highly event driven and the core only captures the terminal state.

Can we please see the sssd_be logs? Put debug_level=6 (for starters) into the [domain] section, restart sssd and then either attach the logs here or send them to me and lslebodn directly with e-mail.

Thank you for reporting the bug!

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.11.6

A fix for this issue was requested by downstream.

owner: somebody => jhrozek

Fields changed

patch: 0 => 1

resolution: => fixed
status: new => closed

Metadata Update from @gprocunier:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.11.6

5 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/3347

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.