#47690 sssd: can't login, need to use the previous password - Cache issue?
Closed: wontfix None Opened 10 years ago by van12.

Since we get sssd on redhat6, we observer (at least 1,2 times every month) users can't login except he used his previous password. Solution is restart sssd. It happens on different RH6 servers.

why? I had tried to disable cache and no nscd running, still the problem is there.

Since it is quite unpredictable (don't know on which server and when it happens), I can't replicate the problem right away and send the debug to you. I had tried "kill -6 <pid>" to get the core dump, but there is no core file I can find.

nb:rm /var/lib/sss/db/cache_default.ldb and "sss_cache -u <USER" not help neither, a "service sssd restart" solved the problem

Please help

[root@arlnbu01 ~]# ps -ef|grep nscd
root 14784 13811 0 15:40 pts/1 00:00:00 grep nscd
[root@arlnbu01 ~]# ps -ef|grep sssd
root 13893 1 0 15:32 ? 00:00:00 /usr/sbin/sssd -f -D
root 13894 13893 0 15:32 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain default --debug-to-files
root 13895 13893 0 15:32 ? 00:00:00 /usr/libexec/sssd/sssd_nss --debug-to-files
root 13896 13893 0 15:32 ? 00:00:00 /usr/libexec/sssd/sssd_pam --debug-to-files
root 14788 13811 0 15:40 pts/1 00:00:00 grep sssd
[root@arlnbu01 ~]#

[root@anbu01 ~]# cat /etc/sssd/sssd.conf

[sssd]
config_file_version = 2
services = nss, pam

domains = default

debug_level = 5

debug_to_files = true

[nss]

enum_cache_timeout = 30

filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd

[domain/default]
auth_provider = ldap
ldap_id_use_start_tls = True
chpass_provider = ldap
ldap_search_base = dc=NNIT
id_provider = ldap
enumerate = True

cache_credentials = True

offline_credentials_expiration = 3
ldap_tls_cacertdir=/etc/openldap/cacerts
ldap_uri=ldap://mars,ldap://venus


I wonder if this could be a replication issue between your servers (mars and venus).

But to debug sssd I would need you to put debug_level=4 to the domain section and restart it. Then you would see a file called /var/log/sssd/sssd_default.log (in general it's sssd_$domainname.log), please attach that file.

Do the server logs say anything about the issue?

Replying to [comment:1 jhrozek]:

I wonder if this could be a replication issue between your servers (mars and venus).

That'd be a good place to start with. To narrow down the problem, is it possible to run these command lines when the login failure occurs?
ldapsearch -h mars -p 389 -x -D "<login_failed_user>" -w "<new_password>" -b "" -s base
ldapsearch -h mars -p 389 -x -D "<login_failed_user>" -w "<old_password>" -b "" -s base
ldapsearch -h venus -p 389 -x -D "<login_failed_user>" -w "<new_password>" -b "" -s base
ldapsearch -h venus -p 389 -x -D "<login_failed_user>" -w "<old_password>" -b "" -s base

To allow to generate a core, you may need to configure your system based upon this FAQ article.
http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes

But to debug sssd I would need you to put debug_level=4 to the domain section and restart it. Then you would see a file called /var/log/sssd/sssd_default.log (in general it's sssd_$domainname.log), please attach that file.

Do the server logs say anything about the issue?

to jhrozek:

I wonder if this could be a replication issue between your servers (mars and venus).
when there is problem, we can login to other servers (houx, aix, linux) without any problem. I will try the ldapsearch next time when the issue arise again

I choose one server and has enable the verbose to 4. I will attach the log when it happens again (can take weeks)

[root@arlnbu01 sssd]# ls -ltr
total 132
-rw------- 1 root root 0 Apr 11 2013 ldap_child.log
-rw------- 1 root root 0 Apr 11 2013 sssd_nss.log
-rw------- 1 root root 0 Nov 6 03:35 sssd_default.log
-rw------- 1 root root 126545 Feb 6 16:46 sssd.log

nothing in sssd.log now
[root@arlnbu01 sssd]# tail sssd.log
(Thu Feb 6 16:46:33 2014) [sssd] [service_send_ping] (0x0100): Pinging pam
(Thu Feb 6 16:46:33 2014) [sssd] [ping_check] (0x0100): Service default replied to ping
(Thu Feb 6 16:46:33 2014) [sssd] [ping_check] (0x0100): Service pam replied to ping
(Thu Feb 6 16:46:33 2014) [sssd] [ping_check] (0x0100): Service nss replied to ping
(Thu Feb 6 16:46:43 2014) [sssd] [service_send_ping] (0x0100): Pinging default
(Thu Feb 6 16:46:43 2014) [sssd] [service_send_ping] (0x0100): Pinging nss
(Thu Feb 6 16:46:43 2014) [sssd] [service_send_ping] (0x0100): Pinging pam
(Thu Feb 6 16:46:43 2014) [sssd] [ping_check] (0x0100): Service default replied to ping
(Thu Feb 6 16:46:43 2014) [sssd] [ping_check] (0x0100): Service pam replied to ping
(Thu Feb 6 16:46:43 2014) [sssd] [ping_check] (0x0100): Service nss replied to ping

to nhosoi:

To allow to generate a core, you may need to configure your system based upon this FAQ article.
http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes

it is not the server there is a problem, I just want to make a dump of the process "sssd" so you can take a look why it not accept the actual password.

it happen again today on one RH6 server (like I wrote in last mail it occurs coincidencely), I can login to this server with my old ldap password. There is no sssd log (debug=4) on this server.

I test the ldapsearch on both ldap servers as suggest above, I confirm both ldap servers accept ONLY my new password. And yes, I can login with my new password on other servers.

Other users notice the same issue.

log show nothings, strace keep say " -1 EAGAIN (Resource temporarily unavailable)" . Resource? does it mean the ldap servers?

I had just telnet 389 on both ldap servers without any problem and as wrote above I/we can login on other *nix servers without problem

is this a bug in sssd, which disconnect it from ldap?

please help, is there a way to dump the sssd on the ldap client? "kill -6 <pid>" didn't dump anything.

[root@dsapp0021 ~]# strace -p 6838
writev(13, [{"l\1\0\1\0\0\0\0000\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
writev(15, [{"l\1\0\1\0\0\0\0002\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
writev(14, [{"l\1\0\1\0\0\0\0000\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
epoll_wait(6, {{EPOLLIN, {u32=24196720, u64=24196720}}}, 1, 9989) = 1
read(15, "l\2\1\1\0\0\0\0002\205\7\0\10\0\0\0\5\1u\0002\205\7\0", 2048) = 24
read(15, 0x17162e0, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=24201456, u64=24201456}}}, 1, 9989) = 1
read(14, "l\2\1\1\0\0\0\0000\205\7\0\10\0\0\0\5\1u\0000\205\7\0", 2048) = 24
read(14, 0x17152c0, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=24189584, u64=24189584}}}, 1, 9989) = 1
read(13, "l\2\1\1\0\0\0\0000\205\7\0\10\0\0\0\5\1u\0000\205\7\0", 2048) = 24
read(13, 0x1712460, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {}, 1, 9989) = 0
writev(13, [{"l\1\0\1\0\0\0\0001\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
writev(15, [{"l\1\0\1\0\0\0\0003\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
writev(14, [{"l\1\0\1\0\0\0\0001\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
epoll_wait(6, {{EPOLLIN, {u32=24189584, u64=24189584}}}, 1, 9989) = 1
read(13, "l\2\1\1\0\0\0\0001\205\7\0\10\0\0\0\5\1u\0001\205\7\0", 2048) = 24
read(13, 0x1712460, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=24196720, u64=24196720}}}, 1, 9988) = 1
read(15, "l\2\1\1\0\0\0\0003\205\7\0\10\0\0\0\5\1u\0003\205\7\0", 2048) = 24
read(15, 0x17162e0, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=24201456, u64=24201456}}}, 1, 9988) = 1
read(14, "l\2\1\1\0\0\0\0001\205\7\0\10\0\0\0\5\1u\0001\205\7\0", 2048) = 24
read(14, 0x17152c0, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, ^C <unfinished ...>
Process 6838 detached

after restart of sssd (I can again login with the new password), the strace still show (Resource temporarily unavailable)...so this warning can ignore I think

strace -p 5393

read(14, 0x1dceb30, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {}, 1, 9989) = 0
writev(13, [{"l\1\0\1\0\0\0\0\21\0\0\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
writev(15, [{"l\1\0\1\0\0\0\0\21\0\0\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
writev(14, [{"l\1\0\1\0\0\0\0\21\0\0\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112
epoll_wait(6, {{EPOLLIN, {u32=31239424, u64=31239424}}}, 1, 9989) = 1
read(13, "l\2\1\1\0\0\0\0\21\0\0\0\10\0\0\0\5\1u\0\21\0\0\0", 2048) = 24
read(13, 0x1dcb6d0, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=31246272, u64=31246272}}}, 1, 9989) = 1
read(15, "l\2\1\1\0\0\0\0\21\0\0\0\10\0\0\0\5\1u\0\21\0\0\0", 2048) = 24
read(15, 0x1dcf5b0, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, {{EPOLLIN, {u32=31250768, u64=31250768}}}, 1, 9989) = 1
read(14, "l\2\1\1\0\0\0\0\21\0\0\0\10\0\0\0\5\1u\0\21\0\0\0", 2048) = 24
read(14, 0x1dceb30, 2048) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(6, ^C <unfinished ...>
Process 5393 detached
[root@dsbapp0021 init.d]#

ps: I and the other collegue change our ldap password several weeks ago.

jhrozek - do you need assistance from the 389 team to debug this issue?

I'm sorry I haven't replied to this ticket earlier, I forgot to CC myself.

I would start with SSSD debugging. In order to debug login issues, you should put debug_level=6 (or higher) to pam and domain sections of the config file, then restart sssd. The sssd.log file you were inspecting does not contain debug levels from the worker processes, but rather the "monitor" process that mostly just acts as a watchdog.

Does your sssd.conf use cache_credentials=True ? If so, then my guess would be that the sssd is actually offline and logins are happening against the cache. If you don't mind extra messages on login, you can also set pam_verbosity to 2 and then you'd see Authenticated with cached credentials on login offline.

Oh, and I forgot to say that pam_verbosity should be set in the pam section.

Thanks the debug is now set on one server, which we use most. Waiting time before the error occurs again can be months. cache_credentials is off.

[sssd]
config_file_version = 2
services = nss, pam

domains = default
debug_to_files = true
debug_level = 2 #this one write to sssd.log, a watchdog, jhrozek said

[nss]
enum_cache_timeout = 30
filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd

[domain/default]
auth_provider = ldap
debug_level = 6 # write to sssd_default.log
ldap_id_use_start_tls = True
chpass_provider = ldap
ldap_search_base = dc=NNIT
id_provider = ldap
enumerate = True
cache_credentials = false
ldap_tls_cacertdir=/etc/openldap/cacerts
ldap_uri=ldap://mars,ldap://venus

[pam]
debug_level = 6
pam_verbosity = 2

hi

it happened again today, neither new or old password help this time (we try with 3 different ldap accounts)

NB: login with the same ldap accounts on other servers without any problems.

[root@arlnbu01 sssd]# ps -ef|grep sssd
root 10171 1 0 Feb13 ? 00:00:18 /usr/sbin/sssd -f -D
root 10172 10171 0 Feb13 ? 00:09:13 /usr/libexec/sssd/sssd_be --domain default --debug-to-files
root 10173 10171 0 Feb13 ? 00:00:26 /usr/libexec/sssd/sssd_nss --debug-to-files
root 10174 10171 0 Feb13 ? 00:00:07 /usr/libexec/sssd/sssd_pam --debug-to-files

the sssd log files can be found here www.chezmoi.dk/div/sssd.tar.gz

Please help

Thank you, the logs have some more info. The first authentication request I see in the logs fails with "Invalid Credentials", that usually means wrong password:
{{{
(Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [simple_bind_send] (0x0100): Executing simple bind as: cn=Tuan Nguyen,cn=unixtek,ou=Infrastructure,dc=nnit
(Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [simple_bind_done] (0x0400): Bind result: Invalid credentials(49), no errmsg set
(Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Backend returned: (0, 6, <NULL>) [Success]
(Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sending result [6][default]
(Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sent result [6][default]
}}}

Then after a couple of tries, it seems that the user hits some administrative limit:
{{{
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sdap_uri_callback] (0x0400): Constructed uri 'ldap://arlmgtdk02.global.centralorg.net'
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sss_ldap_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sdap_sys_connect_done] (0x0100): Executing START TLS
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sdap_connect_done] (0x0080): START TLS result: Success(0), Start TLS request accepted.Server willing to negotiate SSL.
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'arlmgtdk02.global.centralorg.net' as 'working'
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [set_server_common_status] (0x0100): Marking server 'arlmgtdk02.global.centralorg.net' as 'working'
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [simple_bind_send] (0x0100): Executing simple bind as: cn=Tuan Nguyen,cn=unixtek,ou=Infrastructure,dc=nnit
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [simple_bind_done] (0x0400): Bind result: Constraint violation(19), Exceed password retry limit. Please try later.
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Backend returned: (3, 4, <NULL>) [Internal Error (System error)]
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sending result [4][default]
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sent result [4][default]
}}}

At the very least we now know what server was the SSSD talking to. Can you check if there is anything of interest in the server logs of "arlmgtdk02.global.centralorg.net" ?

{{{
(Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [simple_bind_done] (0x0400): Bind result: Constraint violation(19), Exceed password retry limit. Please try later.
}}}

This means you tried to bind too many times with the wrong password and were locked out.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/Managing_the_Password_Policy-Configuring_the_Account_Lockout_Policy.html

Can you check if there is anything of interest in the server logs of "arlmgtdk02.global.centralorg.net" ?
Bind result: Constraint violation(19), Exceed password retry limit

Many Thanks to jhrozek and rmeggins, you had given me some hints
arlmgtdk02 is a slave (replicate), arlmgtdk01 is the ldap master (389-ds). sssd.conf shows arlmgtdk01 in front of arlmgtdk02.

The client switches fort and back between the master/slave is not an issue but why it failed on the slave ldap server arlmgtdk02? This evening test show I could get in if I use my previous-previous password (notice previous-previous; not last but last-last)

The slave is configured with "sync on the following days" not "always keep directories in sync" at the Replication Schedule. Maybe this is the issue?

I did a "Initialize Consumer" to re-init the slave and then the login problem is go away without I need to restart the sssd daemon as the other days :-). The slave arlmgtdk02 now has the "right/correct" data received from the Master.

I enable now the "always keep directories in sync" for the Replication (on all of our 3 ldap enviroments) , let see if the problem goes away, otherwise I need help from Rich :-). I will post the result in 4 weeks.

I shutdown the slapd on arlmgtdk02, and then sssd reconnect to arlmgtdk01 again, see the log below
sssd_default.log:
(Mon Feb 24 21:38:32 2014) [sssd[be[default]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'arlmgtdk01.global.centralorg.net' as 'working'

So the conclusion is: there is no bugs in sssd, but the sync between master-slave somehow "out of sync" when I use "sync on the following days". I test now the othe option "always keep directories in sync"

I have 2 screendumps for the previous-previous password test and the replication Schedule.
www.chezmoi.dk/div/sssd2.zip

Again thanks very much for your help.
Best regards
Tuan

hello again

So the conclusion is: there is no bugs in sssd, but the sync between master-slave somehow "out of sync" when I use "sync on the following days". I test now the other option "always keep directories in sync"

For 2 weeks ago and this week, the sssd issue occurs again (I can login but one of my other collegue can´t) on two other rh6 servers. A manually "Initialize Consumer" on the ldap master (described above) fixed the problem.

So somehow the master-slave sync didn't get all the data from the Master even I used "always keep directories in sync". From the Replication status log, there was no error, It always shows "Replica acquired successfully: Incremental update succeeded". Inside the error and access files I can't see any wrong about the replicate.

Please help

BRTuan

Replying to [comment:14 van12]:

hello again

So the conclusion is: there is no bugs in sssd, but the sync between master-slave somehow "out of sync" when I use "sync on the following days". I test now the other option "always keep directories in sync"

For 2 weeks ago and this week, the sssd issue occurs again (I can login but one of my other collegue can´t) on two other rh6 servers. A manually "Initialize Consumer" on the ldap master (described above) fixed the problem.

I'm assuming you are using the console to trigger the "initialize consumer". The next time this occurs, see if selecting "Send updates now" also resolves the issue.

But, prior to that, once the issue occurs please perform a ldapsearch on that user entry on both replicas(requesting the userpassword attribute) - so we can check for any differences. Then see if "send updates now" helps.

Thanks!

So somehow the master-slave sync didn't get all the data from the Master even I used "always keep directories in sync". From the Replication status log, there was no error, It always shows "Replica acquired successfully: Incremental update succeeded". Inside the error and access files I can't see any wrong about the replicate.

Please help

BRTuan

I'm assuming you are using the console to trigger the "initialize consumer". The next >time this occurs, see if selecting "Send updates now" also resolves the issue.
Didn't help

But, prior to that, once the issue occurs please perform a ldapsearch on that user entry >on both replicas(requesting the userpassword attribute) - so we can check for any >differences. Then see if "send updates now" helps.

ldapsearch -xLLL -Z -b dc=nnit "(&(uid=tnng))" userPassword
get nothing so I export it to a ldif and get the infos from there

Master:
....
nsUniqueId: 640af15e-b84c11e2-b4dec459-5c8956ea
uidNumber: 8078
passwordRetryCount: 0
retryCountResetTime: 20140423183539Z
passwordExpWarned: 0
passwordExpirationTime: 20140622102611Z
passwordGraceUserTime: 0
passwordAllowChangeTime: 20140414102611Z
modifyTimestamp: 20140413102611Z
modifiersName: cn=server,cn=plugins,cn=config
passwordHistory: 20130925110404Z{crypt}gy9YjfDKX//dk
passwordHistory: 20131202133021Z{crypt}piOppDJ7Rpie.
passwordHistory: 20140208134717Z{crypt}bMpK2.1wgu8MQ
passwordHistory: 20140413102611Z{crypt}4FRvw4sn5vdr2
userPassword:: e2NyeXB0fW9pajRheVBVWC5aUm8=

Slave:
...
nsUniqueId: 640af15e-b84c11e2-b4dec459-5c8956ea
createTimestamp: 20130509020152Z
creatorsName: cn=directory manager
uidNumber: 8078
uid: tnng
objectClass: top
objectClass: posixaccount
userPassword:: e2NyeXB0fTRGUnZ3NHNuNXZkcjI=
passwordHistory: 20130925110404Z{crypt}gy9YjfDKX//dk
passwordHistory: 20131202133021Z{crypt}piOppDJ7Rpie.
passwordHistory: 20140208134717Z{crypt}bMpK2.1wgu8MQ
passwordAllowChangeTime: 20140209134717Z
passwordGraceUserTime: 0
modifiersName: cn=directory manager
modifyTimestamp: 20140310072215Z
retryCountResetTime: 20140425120338Z
passwordRetryCount: 3
passwordExpirationTime: 20140419180658Z
passwordExpWarned: 1
accountUnlockTime: 20140425122914Z

UPS, the password is not similar.

Notice, I change the password on master 13th of april, but this didn´t get throug to the SLAVE. If this is the case the userPassword on SLAVE (e2NyeXB0fTRGUnZ3NHNuNXZkcjI=) must be the same as this line from MASTER passwordHistory: 20140413102611Z

I will create a cronjob which sync data (using ldif) from master -> slave as a workaround

Thanks
Tuan

ps: The master imports a ldif (from another ldap server on another network/Zone) into it every morning. Maybe this confuse the sync/update??
For the work around I will use the same princip on the SLAVE

16 3 * * * /sbin/service dirsrv stop;/usr/lib64/dirsrv/slapd-NNIT/ldif2db -s "dc=nnit" -i /tmp/tmp.ldf;/sbin/service dirsrv start;/sbin/service sssd restart

Forgot to mention, notice the accountUnlockTime on the SLAVE too, "Send updates now" doesn't remove this entry.

nb: it is a MASTER-SLAVE so if I want to reset accountUnlockTime directly on the SLAVE GUI, I will get an error "LDAP is unwilling to perform: cannot update referral" (ok with me, no problem). Just wish the "Send updates now" do it

Replying to [comment:16 van12]:

I'm assuming you are using the console to trigger the "initialize consumer". The next >time this occurs, see if selecting "Send updates now" also resolves the issue.
Didn't help

Any errors in the error log?

Could you give us your current status?

We wonder if the subject "sssd: can't login, need to use the previous password" is still an issue or not. If it is solved, can we close this ticket?

And if you still have some problem to make SSSD work, may we move the discussion to the mailing list or IRC?

Thanks.

The conclusion was written above 2 months ago
the slave didn't gotten the same hash password as the master. Happen occasionally

Thank you for your update. This is very interesting...

ps: The master imports a ldif (from another ldap server on another network/Zone) into it every morning. Maybe this confuse the sync/update??

I have a couple of questions.
1) Is the ldif file from another server supposed to be in sync with the master's ldif? I.e., the master and another ldif server are configured to replicate each other?
2) When you export an ldif from another ldap server, what command line or GUI operation do you use?
3) What is the purpose of the export / import?

1) Is the ldif file from another server supposed to be in sync with the master's ldif? I.e., the master and another ldif server are configured to replicate each other?
There are 3 separate locations (one central and 2 minor sites) each with 389-ds Master/Slave setup. Neither port 389/636 is opened between locations.
Sync happened through ldif file which is coming from the central location, I scp the file every 10 min to the other two sites (only to the Master server of both site) through a "jumper" server.

so there is ONLY one ldif file from the Master at the central location

2) When you export an ldif from another ldap server, what command line or GUI operation do you use?
from cronjob on the Master of the central location:
/usr/lib64/dirsrv/slapd-NNIT/db2ldif -s "dc=nnit" -a /tmp/tmp.ldf

import on on the MASTER at the other 2 sub sites.
NB: There is a replicate agreement for the SLAVE at each location
/sbin/service dirsrv stop;/usr/lib64/dirsrv/slapd-NNIT/ldif2db -s "dc=nnit" -i /tmp/tmp.ldf;/sbin/service dirsrv start
nb: the SLAVE get "sync" from the MASTER through the replicate agreement I setup from GUI

3) What is the purpose of the export / import?
Firewall (see 1) above )

So on the sub sites we experience some of the password on Master and slave doesn't match. No matter how I sync (see my last threads above), I need to do a "initialize" to fix it. AT the moment I have a script to do the initilize once a day.

Thansk
/Tuan

Thank you for your detailed description, Tuan.

If I read correctly, the central cluster and 2 minor sites are not replicated each other. And you export the DB on the central master and import it to the 2 minor masters every day.

Your scenario requires consumer initialization on the 2 minor sites once you import the ldif from central master.

Replication manages internal data, which includes RUV entries, tombstone entries, and CSN in each updated attribute. Please see http://directory.fedoraproject.org/wiki/Architecture#Replication for more details. These internal data are shared only in the closed replication topology. That is, you cannot "copy" the internal data from one server to another if they are not configured to replicate each other. Plus, if you export the DB with "db2ldif" without "-r" option, the internal data are not exported to the ldif file. (This is not your case, but if you want to export and import among the servers in the same replication topology, you could use "db2ldif -r ..." to generate an ldif file with the internal data and import it to the other servers. That does not require the consumer initialization after the import.)

In your use case, after this import, you have to initialize consumers.
/sbin/service dirsrv stop;/usr/lib64/dirsrv/slapd-NNIT/ldif2db -s"dc=nnit" -i /tmp/tmp.ldf;/sbin/service dirsrv start

The consumer initialization can be done by running ldapmodify on each replication agreement entry as follows:

dn: cn=<YOUR_AGREEMENT>,cn=replica,cn="<YOUR_SUFFIX>",cn=mapping tree,cn=config
changetype: modify
replace: nsds5beginreplicarefresh
nsds5beginreplicarefresh: start

Thanks Noriko,
Thanks for the explanation about the replicatiom, I will read the link and look at the "-r" option.
I use autoexpect to do a "consumer initialization" every morning in cronjob

By the way how I can flush the cache on a ldap client? is there a command for this? (nscd not running)

SSSD provide sss_cache command for cache cleanup.

You may just run it with
-U,--users
Invalidate all user records. This option overrides invalidation of specific user if it was also set.

But if there could be some other updates in the database, please use it this option.
-E,--everything
Invalidate all cached entries except for sudo rules.

thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache?

Replying to [comment:26 van12]:

thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache?
Could you tell us which cache you want to flush?

Replying to [comment:27 nhosoi]:

Replying to [comment:26 van12]:

thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache?
Could you tell us which cache you want to flush?
If you are using nss_ldap on RHEL5, there is no cache to be flushed. Do you have any other "client" in your mind?

let say I do an "id tng" on a ldap client A, i get the 3 groups he is a member of. And then I add "tng" in a 4th group on LDAP server and then do an "id tng" again...and still the output only show 3 groups...If I wait long enough (let say 5min) , "id tng" will show all 4 groups

if I do "id tng" on another ldap client B, it shows correctly at first try (4 groups)

So how I can flush the ldap cache on client A, so "id tng" will show all 4 Groups right away?

Br
Tuan

When you write "a ldap client A" and "B", do you mean a host which is a replica (or a consumer) of the master server?

And this use case is that you update a group of "tng" on the server and the change is replicated to the consumer host B immediately, but not to A? It takes some time (e.g., 5 min.) to be replicated to the host A?

Let me clarify a couple of things...
1) This is not related to the export/import issue we talked in the comment 23, 24 since the change is eventually replicated, is it?
2) I assume both replica agreements (to consumer A and B) are configured as
nsDS5ReplicaUpdateSchedule: 0000-2359 0123456
See also:
https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Core_Server_Configuration_Reference.html#Replication_Attributes_under_cnReplicationAgreementName_cnreplica_cnsuffixName_cnmapping_tree_cnconfig-nsDS5ReplicaUpdateSchedule
3) Replication from master to consumer is not guaranteed to happen at once. The elapsed time in the replication could vary depending upon the host and network status. 5 min. sounds too long, but if the network or server or host is busy, the particular update could be put into the backlog. Do you see such load or traffic?

sorry, I confuse you, please forget what I ask about the flush cache.
If you like pls close the case

Replying to [comment:31 van12]:

sorry, I confuse you, please forget what I ask about the flush cache.
If you like pls close the case

Don't be sorry... I'm just trying to understand your problem. Do you think it's a replication speed between the server and consumer A? Could there be anything particular which slows down the replication compared to the consumer B?

my last question about flush has nothing to do neither with master, slave/consumer, replication. Just two regular servers using ldap (I call ldap clients A & B).

Replying to [comment:33 van12]:

my last question about flush has nothing to do neither with master, slave/consumer, replication. Just two regular servers using ldap (I call ldap clients A & B).

I see. Thanks. It should be a silly question, but the client A and B points to the same server or different ones such as B pointing to a master while A pointing to a consumer?

If they point to the same server, if you run ldapsearch user "tng" on the client A and B, you get the different memberof values?

Since you noted you are not running NSCD, I cannot find any other caching mechanism in the picture...

Hello Tuan,

Any update on your system? If the original issue is solved by adding the consumer initialization after the import, it might be a good time to move to the mailing list.

If you agree with it, let us know. We are going to close this ticket.

Thank you so much for your help.
--noriko

hello Noriko

I surrende, I got another error today, all authentication through the Master failed, needed to shut it Down and let the SLAVE take over (which use 389-ds-base-1.2.11.15-32.el6_5.x86_64)

I don't want to use more time on those 2 new releases anymore,
389-ds-base-1.2.11.29-2.el5 or 389-ds-base-1.2.11.28-1.el5

I get nowhere, only trouble

I will downgrad the master to 389-ds-base-1.2.10.14-2.el5, hard/impossible to find old EL5 packages, any help will appreciate.

Please close this case

thanks
Tuan

ps: glad that we have different OS, 389* release on Master and Slave, so the bugs can't take the whole setup down. I will remember this advantage in the future

Tuan,

I'm sorry to hear that... Hopefully, you have a chance to upgrade to the newer version with no problem some time soon.

Metadata Update from @van12:
- Issue set to the milestone: N/A

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1026

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Invalid)

3 years ago

Login to comment on this ticket.

Metadata