#3197 sssd is incorrectly handling ldap referrals
Closed: Invalid a year ago Opened 3 years ago by ondrejv2.

Getting this log:

(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sbus_dispatch] (0x4000): dbus conn: 0x22d8310 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sbus_dispatch] (0x4000): Dispatching.
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sbus_message_handler] (0x2000): Received SBUS method org.freedesktop.sssd.dataprovider.getAccountInfo on path /org/freedesktop/sssd/dataprovider
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [be_get_account_info] (0x0200): Got request for [0x1001][FAST BE_REQ_USER][1][name=nigels] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [be_req_set_domain] (0x0400): Changing request domain from [default] to [default] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_search_user_next_base] (0x0400): Searching for users with base [DC=mydomain,DC=ad,DC=com]
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_print_server] (0x2000): Searching 192.168.128.4 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(uid=nigels)(objectclass=user)(uid=*)(&(uidNumber=*)(!(uidNumber=0))))][DC=mydomain,DC=ad,DC=com].
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uid] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [unixUserPassword] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uidNumber] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gidNumber] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gecos] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [unixHomeDirectory] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [loginShell] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userPrincipalName] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [name] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberOf] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectGUID] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectSID] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [primaryGroupID] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [whenChanged] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [uSNChanged] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [accountExpires] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userAccountControl] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 29 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_op_add] (0x2000): New operation 29 timeout 6 (Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_process_result] (0x2000): Trace: sh[0x638ad80], connected[1], ops[0x638b640], ldap[0x6386ef0] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT] 
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_generic_op_finished] (0x0400): Search result: Referral(10), 0000202B: RefErr: DSID-03100781, data 0, 1 access points
        ref 1: 'mydomain.ad.com'
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] 
[sdap_get_generic_ext_add_references] (0x1000): Additional References:
ldap://mydomain.ad.com/DC=mydomain,DC=ad,DC=com
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_op_destructor]
(0x2000): Operation 29 finished (Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000): Request included referrals which were ignored.
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://mydomain.ad.com/DC=mydomain,DC=ad,DC=com
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_search_user_process] (0x0400): Search for users, returned 0 results.
(Thu Sep 22 13:35:41 2016) [sssd[be[default]]] [sdap_get_users_done]
(0x0040): Failed to retrieve users (Thu Sep 22 13:35:41 2016) 
[sssd[be[default]]] [sdap_id_op_done] (0x4000): releasing operation 
connection

Note that the server queried above (192.168.128.4) is DC for root domain "ad.com" - so no wonder it did not find the user 'nigels'.
Note that ldap referrals are disabled in my case


Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.14.2
owner: somebody => fidencio

Fields changed

rhbz: => todo

Ondrej, I'd like to ask you clear steps on how to setup an environment where I can reproduce the same issue.

There are no clear steps. Just make sssd to authenticate against forest AD with empty forest domain 'ad.com' above.
I think (from the logs) the problem starts with the "reusing cached connection" - looks like we are reusing wrong connection as just above this message I see:

(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_REFERENCE]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_get_generic_ext_add_references] (0x1000): Additional References: ldap://DomainDnsZones.ad.com/DC=DomainDnsZones,DC=ad,DC=com
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_result] (0x2000): Trace: sh[0x63adfa0], connected[1], ops[0x65f6b40], ldap[0x234ed30]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_REFERENCE]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_get_generic_ext_add_references] (0x1000): Additional References: ldap://ForestDnsZones.ad.com/DC=ForestDnsZones,DC=ad,DC=com
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_result] (0x2000): Trace: sh[0x63adfa0], connected[1], ops[0x65f6b40], ldap[0x234ed30]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_REFERENCE]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_get_generic_ext_add_references] (0x1000): Additional References: ldap://ad.com/CN=Configuration,DC=ad,DC=com
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_result] (0x2000): Trace: sh[0x63adfa0], connected[1], ops[0x65f6b40], ldap[0x234ed30]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [sdap_op_destructor] (0x2000): Operation 18 finished
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000): Request included referrals which were ignored.
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://mydomain.ad.com/DC=mydomain,DC=ad,DC=com
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://DomainDnsZones.ad.com/DC=DomainDnsZones,DC=ad,DC=com
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://ForestDnsZones.ad.com/DC=ForestDnsZones,DC=ad,DC=com
(Thu Sep 22 13:39:16 2016) [sssd[be[default]]] [generic_ext_search_handler] (0x4000):     Ref: ldap://ad.com/CN=Configuration,DC=ad,DC=com

I do not know why these references are here as in my config I have explicitly set:
ad_domain = mydomain.ad.com
ldap_referrals = False

Sorry Ondrej, I'm kinda new here, so I really need your help in order to try to help you.

This is what I've done so far:

1) Set up DC for ad.fidencio.lan (192.168.122.76)

2) Set up a secondary DC for the same existing domain (192.168.122.24)

3) Set up a linux machine and just, where I just ran a realm join

I've been able to {{{ssh -l administrator@ad.fidencio.lan localhost}}} without any issue.

[ffidenci@freeipaclient ~]$ ssh -l administrator@ad.fidencio.lan localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:4ObG568ArUePBHC5/axw5etm41eHUvCs+aYrErUnq5M.
ECDSA key fingerprint is MD5:57:e2:46:bc:a7:a4:a7:9e:f9:18:31:43:69:b1:57:56.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
administrator@ad.fidencio.lan@localhost's password: 
Creating home directory for administrator@ad.fidencio.lan.

And klist shows me:

[administrator@ad.fidencio.lan@freeipaclient ~]$ klist 
Ticket cache: KEYRING:persistent:95000500:krb_ccache_0XCqPzw
Default principal: Administrator@AD.FIDENCIO.LAN

Valid starting     Expires            Service principal
08/10/16 09:55:15  08/10/16 19:55:15  krbtgt/AD.FIDENCIO.LAN@AD.FIDENCIO.LAN
    renew until 15/10/16 09:55:15
[administrator@ad.fidencio.lan@freeipaclient ~]$ logout
Connection to localhost closed.

You said "there are no clear steps [to reproduce]", but I hope you can at least validate what I'm trying to do and, maybe, give me some hint on what exactly I should do differently in order to reproduce your issue.

_comment0: Sorry Ondrej, I'm kinda new here, so I really need your help in order to try to help you.

This is what I've done so far:
1) Set up DC for ad.fidencio.lan (192.168.122.76)
2) Set up a secondary DC for the same existing domain (192.168.122.24)
3) Set up a linux machine and just, where I just ran a realm join

I've been able to {{{ssh -l administrator@ad.fidencio.lan localhost}}} without any issue.
{{{
[ffidenci@freeipaclient ~]$ ssh -l administrator@ad.fidencio.lan localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:4ObG568ArUePBHC5/axw5etm41eHUvCs+aYrErUnq5M.
ECDSA key fingerprint is MD5:57:e2:46:bc:a7:a4:a7:9e:f9:18:31:43:69:b1:57:56.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
administrator@ad.fidencio.lan@localhost's password:
Creating home directory for administrator@ad.fidencio.lan.
}}}

And klist shows me:
{{{
[administrator@ad.fidencio.lan@freeipaclient ~]$ klist
Ticket cache: KEYRING:persistent:95000500:krb_ccache_0XCqPzw
Default principal: Administrator@AD.FIDENCIO.LAN

Valid starting Expires Service principal
08/10/16 09:55:15 08/10/16 19:55:15 krbtgt/AD.FIDENCIO.LAN@AD.FIDENCIO.LAN
renew until 15/10/16 09:55:15
[administrator@ad.fidencio.lan@freeipaclient ~]$ logout
Connection to localhost closed.
}}}

You said "there are no clear steps [to reproduce]", but I hope you can at least validate what I'm trying to do and, maybe, give me some hint on what exactly I should do differently in order to reproduce your issue. => 1475881526404334

You won't be able to reproduce with a single domain. You need to setup AD forest.

Replying to [comment:6 ondrejv2]:

You won't be able to reproduce with a single domain. You need to setup AD forest.

And that's what I thought that I did setting up a secondary DC for the same existing domain. Isn't the case then?

No, it's not the same. You only setup a backup DC for the same domain.

Replying to [comment:8 ondrejv2]:

No, it's not the same. You only setup a backup DC for the same domain.
Okay, let me (re)setup it here. Once it's done, are the steps that I described the right way to try to reproduce the issue (of course, not considering the second DC setup)? If not, what's the proper way to do that?

Well I would say the steps would be:
1. Set up AD root forest say 'example.com' with a few DC controllers in it
2. Set up a child domain say 'unix.example.com' with a single DC controller in it.
3. populate 'unix.example.com' with some users & groups
4. Bind SSSD to domain unix.example.com & test communication several times. Sooner or later you'll be able to see that SSSD tries to bind to DCs that belong to root forest domain which should result in SSSD to claim that users do not exist.

Okay, I finally was able to properly set up my environment, thanks for your patience. As I've been testing it using a few VMs on my laptop I ended up setting up an AD root forest (fidencio.lan) with two DC controllers in it and one child domain (second.fidencio.lan) with a single DC controller in it.

So far, I haven't been able to reproduce the same issue using SSSD from git master. May I ask which version of SSSD are you using? Also, can you help me to check that my sssd.conf matches yours?

[root@client sssd]# cat /etc/sssd/sssd.conf 
[sssd]
domains = second.fidencio.lan
config_file_version = 2
services = nss, pam
debug_level = 10

[domain/second.fidencio.lan]
ad_domain = second.fidencio.lan
krb5_realm = SECOND.FIDENCIO.LAN
realmd_tags = manages-system joined-with-adcli 
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad
ldap_referrals = False
debug_level = 10

Moving tickets that didn't make it into the 1.14.2 release into the next point release.

milestone: SSSD 1.14.2 => SSSD 1.14.3

Hi,

I tried to reproduce the bug locally as well and I think we need a bit more pieces of information:
- I understand the client machine is joined to the child domain
- Which domain does the user (nigels in the log snippet) come from? I guess it must be the joined child domain as well since you used a short name to log in, not a qualified name
- Can you paste your (sanitized) sssd.conf file to make sure we set properties like use_fully_qualified_names the same in our test environment?
- Does the error reproduce only during logins or also with another request like getent passwd?

Hi,
Ok I am not saying you WILL immediately reproduce this error via the steps done above. See also the response below:
1. Yes.
2. Yes
3. Here it is:

[sssd]
    services = autofs, nss, pam
    config_file_version = 2
    debug_level = 3

    domains = default
[nss]

[domain/default]

    debug_level = 0xfff0
    ldap_id_mapping = False
    ad_domain = MYDOMAIN.AD.COM
    ad_site = Dublin
    id_provider = ad
    auth_provider = ad
    chpass_provider = ad
    autofs_provider = ad
    cache_credentials = True
    krb5_realm = MYDOMAIN.AD.COM
    krb5_canonicalize = False
    ldap_user_name = uid
    krb5_renew_interval = 3600
    krb5_renewable_lifetime = 30d
    ldap_autofs_search_base = CN=autofs,DC=mydomain,dc=ad,DC=com

I have never seen it happening on logins so far - seems like that sometimes (quite rarely though) getent passwd fails.

OK, I'm running a script that tries to resolve users in a loop with a very similar configuration. But so far no luck. And I suspect that the error might be in an area of code that we don't log very well - when we try to match the requested domain with LDAP "domains" that contain the search bases and we end up somehow selecting the wrong search base.

I wonder if you could run a custom build that would not contain any code changes but just try to log more information?

The truth is, that I have not observed it myself for quite some time, too.
Looks like it might be related to some network conditions (i.e. some domain controllers for child domain failing).
Hard to say.
Let's keep it open, I will paste some more information hopefully in the future...

Please provide custom build. I have observed it several times today. It starts to be bit pain.
Also opening RHEL support case to support this.

Support case #01740435 has been created, FYI

Ok it looks like by default SSSD is not afraid of using global catalogs for lookups. Unfortunately in our case not all GCs have information for subdomains. Not sure if this a bug in our AD or not - not an AD expert.
So far it looks like

ad_enable_gc = False

does the trick

Update: Looks like Unix attributes are not replicated to Global Catalog by default. We are using RFC2307 here. I would therefore expect that setting

ldap_id_mapping = False

should automatically disable GC lookups as well.

Does it make a sense?

I don't think so, GC is so far the only entity that can resolve group members from different domains stored in Universal groups.

So I would suggest to perhaps document how and why we use the Global Catalog, but otherwise I suggest we close this ticket.

I see.
Can we at least shout in the logs if the condition above happens?
Does it make a sense?
I mean, people are using AD with posix attrs enabled, so the risk someone else hits this is quite real.

But if you think it's not necessary to do anything, I am fine to close this case.

Hmm, this should already be indicated in the debug logs (although not syslog and we can add a message to syslog, sure..). During the first lookup we should check the GC for presence of posix attributes, there should be a line searching for any gidNumber or uidNumber presence with sizelimit=1 and if no attributes are found, we should disable GC automatically.

Of course, if some posix attributes are replicated but not all, then we can't do much because we don't know upfront which are in fact replicated.

You are right.
In my case, global catalog did not have all attributes SSSD asked for -> ldap referral was returned. Thing is that I used schema editor to populate GC with posix attrs - those I believed are necessary.
But SSSD is asking for attributes it will never use - like 'UnixUserPassword' (?).

Do we have an article describing which posix attrs SSSD expect in Global Catalog?

Metadata Update from @ondrejv2:
- Issue assigned to fidencio
- Issue set to the milestone: SSSD 1.14.3

2 years ago

Metadata Update from @fidencio:
- Assignee reset

a year ago

I guess this should have been closed a long time ago since it's not a bug..

Metadata Update from @jhrozek:
- Custom field design_review reset (from 0)
- Custom field mark reset (from 0)
- Custom field patch reset (from 0)
- Custom field review reset (from 0)
- Custom field sensitive reset (from 0)
- Custom field testsupdated reset (from 0)
- Issue close_status updated to: None

a year ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata