#3594 sssd used wrong search base with wrong AD server
Closed: Fixed 2 years ago Opened 2 years ago by lslebodn.

version: sssd git master c096972

There are 3 AD servers:

  • 10.8.63.34 with search base dc=sssdad,dc=com
  • 10.8.63.35 with search base dc=child1,dc=sssdad,dc=com
  • 10.8.63.39 with search base dc=sssdad_tree,dc=com

sssd is connected to host with IP 10.8.63.35 and search base dc=child1,dc=sssdad,dc=com
sssd.conf:

[sssd]
domains = child1.sssdad.com
config_file_version = 2
services = nss, pam

[domain/child1.sssdad.com]
ad_domain = child1.sssdad.com
krb5_realm = CHILD1.SSSDAD.COM
realmd_tags = manages-system joined-with-adcli 
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_sasl_authid = BKR-HV03-GUEST1$
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad

debug_level = 9
dyndns_update = false

And related part of log files:

(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_search_bases_ex_next_base] (0x0400): Issuing LDAP lookup with base [dc=sssdad_tree,dc=com]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_print_server] (0x2000): Searching 10.8.63.34:389
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectclass=trustedDomain)(trustType=2)(!(msDS-TrustForestT
rustInfo=*)))][dc=sssdad_tree,dc=com].
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [flatName]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [trustPartner]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [securityIdentifier]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [trustType]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [trustAttributes]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 6
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_op_add] (0x2000): New operation 6 timeout 6
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_id_op_connect_done] (0x4000): caching successful connection after 2 notifies
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [be_run_unconditional_online_cb] (0x4000): List of unconditional online callbacks is empty, nothing to do.
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_process_result] (0x2000): Trace: sh[0x225f1420], connected[1], ops[0x226f70a0], ldap[0x225f2180]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_op_finished] (0x0400): Search result: Referral(10), 0000202B: RefErr: DSID-03100742, data 0, 1 access points
        ref 1: 'sssdad_tree.com'
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_add_references] (0x1000): Additional References: ldap://sssdad_tree.com/dc=sssdad_tree,dc=com
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_op_destructor] (0x2000): Operation 5 finished
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [generic_ext_search_handler] (0x4000): Request included referrals which were ignored.
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://sssdad_tree.com/dc=sssdad_tree,dc=com
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_search_bases_ex_done] (0x0400): Receiving data from base [dc=sssdad_tree,dc=com]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_id_op_done] (0x4000): releasing operation connection
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sss_domain_get_state] (0x1000): Domain sssdad_tree.com is Active
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sss_domain_set_state] (0x1000): Domain sssdad_tree.com is Disabled
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sysdb_subdomain_delete] (0x0400): Removing sub-domain [sssdad_tree.com] from db

//snip

(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [ad_get_slave_domain_done] (0x1000): There are no changes
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [ad_subdomains_refresh_done] (0x0400): Subdomains refreshed.
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [be_ptask_done] (0x0400): Task [Subdomains Refresh]: finished successfully
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [be_ptask_schedule] (0x0400): Task [Subdomains Refresh]: scheduling task 14400 seconds from last execution time [1512491090]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_process_result] (0x2000): Trace: sh[0x225f1420], connected[1], ops[0x226f70a0], ldap[0x225f2180]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_op_finished] (0x0400): Search result: Referral(10), 0000202B: RefErr: DSID-03100742, data 0, 1 access points
        ref 1: 'sssdad_tree.com'
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_get_generic_ext_add_references] (0x1000): Additional References: ldap://sssdad_tree.com/dc=sssdad_tree,dc=com
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_op_destructor] (0x2000): Operation 6 finished
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [generic_ext_search_handler] (0x4000): Request included referrals which were ignored.
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://sssdad_tree.com/dc=sssdad_tree,dc=com
(Tue Dec  5 07:24:54 2017) [sssd[be[child1.sssdad.com]]] [sdap_search_bases_ex_done] (0x0400): Receiving data from base [(Tue Dec  5 07:24:59 2017) [sssd[be[child1.sssdad.com]]] [ldb] (0x4000): Added timed event "ltdb_callback": 0xdbc4f20

//crash here

(Tue Dec  5 07:24:59 2017) [sssd[be[child1.sssdad.com]]] [ldb] (0x4000): Added timed event "ltdb_timeout": 0xdbc50b0
(Tue Dec  5 07:24:59 2017) [sssd[be[child1.sssdad.com]]] [ldb] (0x4000): Running timer event 0xdbc4f20 "ltdb_callback"
(Tue Dec  5 07:24:59 2017) [sssd[be[child1.sssdad.com]]] [ldb] (0x4000): Destroying timer event 0xdbc50b0 "ltdb_timeout"
(Tue Dec  5 07:24:59 2017) [sssd[be[child1.sssdad.com]]] [ldb] (0x4000): Ending timer event 0xdbc4f20 "ltdb_callback"
(Tue Dec  5 07:24:59 2017) [sssd[be[child1.sssdad.com]]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb

I think that it would not crash with correct IP and search base.
10.8.63.34 .. dc=sssdad,dc=com

(gdb) bt
#0  sdap_search_bases_ex_next_base (req=req@entry=0x56135aa23120) at src/providers/ldap/sdap_ops.c:145
#1  0x00007fb737410d22 in sdap_search_bases_ex_done (subreq=0x0) at src/providers/ldap/sdap_ops.c:220
#2  0x00007fb7373e0069 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
#3  0x00007fb7373e2704 in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1579
#4  0x00007fb7373e10bd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x56135a9e4ba0) at src/providers/ldap/sdap_async.c:353
#5  sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
#6  0x00007fb744756edb in epoll_event_loop (tvalp=0x7ffd15eee380, epoll_ev=0x56135a978f40) at ../tevent_epoll.c:728
#7  epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:930
#8  0x00007fb7447552a7 in std_event_loop_once (ev=0x56135a978cc0, location=0x7fb748501607 "src/util/server.c:721") at ../tevent_standard.c:114
#9  0x00007fb74475107d in _tevent_loop_once (ev=ev@entry=0x56135a978cc0, location=location@entry=0x7fb748501607 "src/util/server.c:721") at ../tevent.c:726
#10 0x00007fb7447512ab in tevent_common_loop_wait (ev=0x56135a978cc0, location=0x7fb748501607 "src/util/server.c:721") at ../tevent.c:849
#11 0x00007fb744755247 in std_event_loop_wait (ev=0x56135a978cc0, location=0x7fb748501607 "src/util/server.c:721") at ../tevent_standard.c:145
#12 0x00007fb7484de653 in server_loop (main_ctx=0x56135a97a150) at src/util/server.c:721
#13 0x00005613596dc309 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:639
(gdb) p state[0]
$1 = {ev = 0x56135a978cc0, opts = 0x56135a9ded20, sh = 0x56135aa09be0, filter = 0x7fb73815e7c0 "(&(objectclass=trustedDomain)(trustType=2)(!(msDS-TrustForestTrustInfo=*)))", 
  attrs = 0x7ffd15eee1f0, map = 0x0, map_num_attrs = 0, timeout = 6, allow_paging = false, return_first_reply = false, base_iter = 1, cur_base = 0xbebebebebebebebe, 
  bases = 0x56135a9e5cf0, reply_count = 0, reply = 0x0}

I forgot to mention "kind of" reproducer

sh# rm -f /var/log/sssd/*
sh# systemctl restart sssd
sh# id administrator@sssdad.com

But there is a small problem. It does not occur with empty sssd cache. So crash is caused by some leftovers from previous test.

On the other hand sssd did not crashed with 1d88a05 and test passed without any problem. Therefore it seems to be somehow related to performance enhancement changes in https://pagure.io/SSSD/sssd/issue/3503.

I can prepare machine for investigation. And i still have reserved one ATM.

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.16.1
- Issue tagged with: bug, regression

2 years ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1523282

2 years ago

Thank you for finding and filing this bug. In case nobody started fixing this yet, could you prepare a machine for me tomorrow (I start work in the afternoon, so no rush..)

or just send me instructions how to reproduce the bug, I can run downstream tests myself..

Metadata Update from @jhrozek:
- Issue assigned to jhrozek

2 years ago

PR: https://github.com/SSSD/sssd/pull/475 handles the wrong search base, but not the subsequent crash.

Metadata Update from @jhrozek:
- Issue priority set to: critical

2 years ago

Metadata Update from @lslebodn:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Attachments 1