Since we get sssd on redhat6, we observer (at least 1,2 times every month) users can't login except he used his previous password. Solution is restart sssd. It happens on different RH6 servers.
why? I had tried to disable cache and no nscd running, still the problem is there.
Since it is quite unpredictable (don't know on which server and when it happens), I can't replicate the problem right away and send the debug to you. I had tried "kill -6 <pid>" to get the core dump, but there is no core file I can find.
nb:rm /var/lib/sss/db/cache_default.ldb and "sss_cache -u <USER" not help neither, a "service sssd restart" solved the problem
Please help
[root@arlnbu01 ~]# ps -ef|grep nscd root 14784 13811 0 15:40 pts/1 00:00:00 grep nscd [root@arlnbu01 ~]# ps -ef|grep sssd root 13893 1 0 15:32 ? 00:00:00 /usr/sbin/sssd -f -D root 13894 13893 0 15:32 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain default --debug-to-files root 13895 13893 0 15:32 ? 00:00:00 /usr/libexec/sssd/sssd_nss --debug-to-files root 13896 13893 0 15:32 ? 00:00:00 /usr/libexec/sssd/sssd_pam --debug-to-files root 14788 13811 0 15:40 pts/1 00:00:00 grep sssd [root@arlnbu01 ~]#
[root@anbu01 ~]# cat /etc/sssd/sssd.conf
[sssd] config_file_version = 2 services = nss, pam
domains = default
debug_to_files = true
[domain/default] auth_provider = ldap ldap_id_use_start_tls = True chpass_provider = ldap ldap_search_base = dc=NNIT id_provider = ldap enumerate = True
offline_credentials_expiration = 3 ldap_tls_cacertdir=/etc/openldap/cacerts ldap_uri=ldap://mars,ldap://venus
I wonder if this could be a replication issue between your servers (mars and venus).
But to debug sssd I would need you to put debug_level=4 to the domain section and restart it. Then you would see a file called /var/log/sssd/sssd_default.log (in general it's sssd_$domainname.log), please attach that file.
Do the server logs say anything about the issue?
Replying to [comment:1 jhrozek]:
That'd be a good place to start with. To narrow down the problem, is it possible to run these command lines when the login failure occurs? ldapsearch -h mars -p 389 -x -D "<login_failed_user>" -w "<new_password>" -b "" -s base ldapsearch -h mars -p 389 -x -D "<login_failed_user>" -w "<old_password>" -b "" -s base ldapsearch -h venus -p 389 -x -D "<login_failed_user>" -w "<new_password>" -b "" -s base ldapsearch -h venus -p 389 -x -D "<login_failed_user>" -w "<old_password>" -b "" -s base
To allow to generate a core, you may need to configure your system based upon this FAQ article. http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
But to debug sssd I would need you to put debug_level=4 to the domain section and restart it. Then you would see a file called /var/log/sssd/sssd_default.log (in general it's sssd_$domainname.log), please attach that file. Do the server logs say anything about the issue?
to jhrozek:
I wonder if this could be a replication issue between your servers (mars and venus). when there is problem, we can login to other servers (houx, aix, linux) without any problem. I will try the ldapsearch next time when the issue arise again
I choose one server and has enable the verbose to 4. I will attach the log when it happens again (can take weeks)
[root@arlnbu01 sssd]# ls -ltr total 132 -rw------- 1 root root 0 Apr 11 2013 ldap_child.log -rw------- 1 root root 0 Apr 11 2013 sssd_nss.log -rw------- 1 root root 0 Nov 6 03:35 sssd_default.log -rw------- 1 root root 126545 Feb 6 16:46 sssd.log
nothing in sssd.log now [root@arlnbu01 sssd]# tail sssd.log (Thu Feb 6 16:46:33 2014) [sssd] [service_send_ping] (0x0100): Pinging pam (Thu Feb 6 16:46:33 2014) [sssd] [ping_check] (0x0100): Service default replied to ping (Thu Feb 6 16:46:33 2014) [sssd] [ping_check] (0x0100): Service pam replied to ping (Thu Feb 6 16:46:33 2014) [sssd] [ping_check] (0x0100): Service nss replied to ping (Thu Feb 6 16:46:43 2014) [sssd] [service_send_ping] (0x0100): Pinging default (Thu Feb 6 16:46:43 2014) [sssd] [service_send_ping] (0x0100): Pinging nss (Thu Feb 6 16:46:43 2014) [sssd] [service_send_ping] (0x0100): Pinging pam (Thu Feb 6 16:46:43 2014) [sssd] [ping_check] (0x0100): Service default replied to ping (Thu Feb 6 16:46:43 2014) [sssd] [ping_check] (0x0100): Service pam replied to ping (Thu Feb 6 16:46:43 2014) [sssd] [ping_check] (0x0100): Service nss replied to ping
to nhosoi:
To allow to generate a core, you may need to configure your system based upon this FAQ article. http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
it is not the server there is a problem, I just want to make a dump of the process "sssd" so you can take a look why it not accept the actual password.
it happen again today on one RH6 server (like I wrote in last mail it occurs coincidencely), I can login to this server with my old ldap password. There is no sssd log (debug=4) on this server.
I test the ldapsearch on both ldap servers as suggest above, I confirm both ldap servers accept ONLY my new password. And yes, I can login with my new password on other servers.
Other users notice the same issue.
log show nothings, strace keep say " -1 EAGAIN (Resource temporarily unavailable)" . Resource? does it mean the ldap servers?
I had just telnet 389 on both ldap servers without any problem and as wrote above I/we can login on other *nix servers without problem
is this a bug in sssd, which disconnect it from ldap?
please help, is there a way to dump the sssd on the ldap client? "kill -6 <pid>" didn't dump anything.
[root@dsapp0021 ~]# strace -p 6838 writev(13, [{"l\1\0\1\0\0\0\0000\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 writev(15, [{"l\1\0\1\0\0\0\0002\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 writev(14, [{"l\1\0\1\0\0\0\0000\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 epoll_wait(6, {{EPOLLIN, {u32=24196720, u64=24196720}}}, 1, 9989) = 1 read(15, "l\2\1\1\0\0\0\0002\205\7\0\10\0\0\0\5\1u\0002\205\7\0", 2048) = 24 read(15, 0x17162e0, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=24201456, u64=24201456}}}, 1, 9989) = 1 read(14, "l\2\1\1\0\0\0\0000\205\7\0\10\0\0\0\5\1u\0000\205\7\0", 2048) = 24 read(14, 0x17152c0, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=24189584, u64=24189584}}}, 1, 9989) = 1 read(13, "l\2\1\1\0\0\0\0000\205\7\0\10\0\0\0\5\1u\0000\205\7\0", 2048) = 24 read(13, 0x1712460, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {}, 1, 9989) = 0 writev(13, [{"l\1\0\1\0\0\0\0001\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 writev(15, [{"l\1\0\1\0\0\0\0003\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 writev(14, [{"l\1\0\1\0\0\0\0001\205\7\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 epoll_wait(6, {{EPOLLIN, {u32=24189584, u64=24189584}}}, 1, 9989) = 1 read(13, "l\2\1\1\0\0\0\0001\205\7\0\10\0\0\0\5\1u\0001\205\7\0", 2048) = 24 read(13, 0x1712460, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=24196720, u64=24196720}}}, 1, 9988) = 1 read(15, "l\2\1\1\0\0\0\0003\205\7\0\10\0\0\0\5\1u\0003\205\7\0", 2048) = 24 read(15, 0x17162e0, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=24201456, u64=24201456}}}, 1, 9988) = 1 read(14, "l\2\1\1\0\0\0\0001\205\7\0\10\0\0\0\5\1u\0001\205\7\0", 2048) = 24 read(14, 0x17152c0, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, ^C <unfinished ...> Process 6838 detached
after restart of sssd (I can again login with the new password), the strace still show (Resource temporarily unavailable)...so this warning can ignore I think
strace -p 5393
read(14, 0x1dceb30, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {}, 1, 9989) = 0 writev(13, [{"l\1\0\1\0\0\0\0\21\0\0\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 writev(15, [{"l\1\0\1\0\0\0\0\21\0\0\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 writev(14, [{"l\1\0\1\0\0\0\0\21\0\0\0]\0\0\0\1\1o\0\35\0\0\0/org/fre"..., 112}, {"", 0}], 2) = 112 epoll_wait(6, {{EPOLLIN, {u32=31239424, u64=31239424}}}, 1, 9989) = 1 read(13, "l\2\1\1\0\0\0\0\21\0\0\0\10\0\0\0\5\1u\0\21\0\0\0", 2048) = 24 read(13, 0x1dcb6d0, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=31246272, u64=31246272}}}, 1, 9989) = 1 read(15, "l\2\1\1\0\0\0\0\21\0\0\0\10\0\0\0\5\1u\0\21\0\0\0", 2048) = 24 read(15, 0x1dcf5b0, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, {{EPOLLIN, {u32=31250768, u64=31250768}}}, 1, 9989) = 1 read(14, "l\2\1\1\0\0\0\0\21\0\0\0\10\0\0\0\5\1u\0\21\0\0\0", 2048) = 24 read(14, 0x1dceb30, 2048) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(6, ^C <unfinished ...> Process 5393 detached [root@dsbapp0021 init.d]#
ps: I and the other collegue change our ldap password several weeks ago.
jhrozek - do you need assistance from the 389 team to debug this issue?
I'm sorry I haven't replied to this ticket earlier, I forgot to CC myself.
I would start with SSSD debugging. In order to debug login issues, you should put debug_level=6 (or higher) to pam and domain sections of the config file, then restart sssd. The sssd.log file you were inspecting does not contain debug levels from the worker processes, but rather the "monitor" process that mostly just acts as a watchdog.
debug_level=6
pam
domain
Does your sssd.conf use cache_credentials=True ? If so, then my guess would be that the sssd is actually offline and logins are happening against the cache. If you don't mind extra messages on login, you can also set pam_verbosity to 2 and then you'd see Authenticated with cached credentials on login offline.
cache_credentials=True
pam_verbosity
Authenticated with cached credentials
Oh, and I forgot to say that pam_verbosity should be set in the pam section.
Thanks the debug is now set on one server, which we use most. Waiting time before the error occurs again can be months. cache_credentials is off.
domains = default debug_to_files = true debug_level = 2 #this one write to sssd.log, a watchdog, jhrozek said
[nss] enum_cache_timeout = 30 filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd
[domain/default] auth_provider = ldap debug_level = 6 # write to sssd_default.log ldap_id_use_start_tls = True chpass_provider = ldap ldap_search_base = dc=NNIT id_provider = ldap enumerate = True cache_credentials = false ldap_tls_cacertdir=/etc/openldap/cacerts ldap_uri=ldap://mars,ldap://venus
[pam] debug_level = 6 pam_verbosity = 2
hi
it happened again today, neither new or old password help this time (we try with 3 different ldap accounts)
NB: login with the same ldap accounts on other servers without any problems.
[root@arlnbu01 sssd]# ps -ef|grep sssd root 10171 1 0 Feb13 ? 00:00:18 /usr/sbin/sssd -f -D root 10172 10171 0 Feb13 ? 00:09:13 /usr/libexec/sssd/sssd_be --domain default --debug-to-files root 10173 10171 0 Feb13 ? 00:00:26 /usr/libexec/sssd/sssd_nss --debug-to-files root 10174 10171 0 Feb13 ? 00:00:07 /usr/libexec/sssd/sssd_pam --debug-to-files
the sssd log files can be found here www.chezmoi.dk/div/sssd.tar.gz
Thank you, the logs have some more info. The first authentication request I see in the logs fails with "Invalid Credentials", that usually means wrong password: {{{ (Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [simple_bind_send] (0x0100): Executing simple bind as: cn=Tuan Nguyen,cn=unixtek,ou=Infrastructure,dc=nnit (Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [simple_bind_done] (0x0400): Bind result: Invalid credentials(49), no errmsg set (Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Backend returned: (0, 6, <NULL>) [Success] (Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sending result [6][default] (Mon Feb 24 14:02:03 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sent result [6][default] }}}
Then after a couple of tries, it seems that the user hits some administrative limit: {{{ (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sdap_uri_callback] (0x0400): Constructed uri 'ldap://arlmgtdk02.global.centralorg.net' (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sss_ldap_init_send] (0x0400): Setting 6 seconds timeout for connecting (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sdap_sys_connect_done] (0x0100): Executing START TLS (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [sdap_connect_done] (0x0080): START TLS result: Success(0), Start TLS request accepted.Server willing to negotiate SSL. (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'arlmgtdk02.global.centralorg.net' as 'working' (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [set_server_common_status] (0x0100): Marking server 'arlmgtdk02.global.centralorg.net' as 'working' (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [simple_bind_send] (0x0100): Executing simple bind as: cn=Tuan Nguyen,cn=unixtek,ou=Infrastructure,dc=nnit (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [simple_bind_done] (0x0400): Bind result: Constraint violation(19), Exceed password retry limit. Please try later. (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Backend returned: (3, 4, <NULL>) [Internal Error (System error)] (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sending result [4][default] (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [be_pam_handler_callback] (0x0100): Sent result [4][default] }}}
At the very least we now know what server was the SSSD talking to. Can you check if there is anything of interest in the server logs of "arlmgtdk02.global.centralorg.net" ?
{{{ (Mon Feb 24 14:04:50 2014) [sssd[be[default]]] [simple_bind_done] (0x0400): Bind result: Constraint violation(19), Exceed password retry limit. Please try later. }}}
This means you tried to bind too many times with the wrong password and were locked out.
https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/Managing_the_Password_Policy-Configuring_the_Account_Lockout_Policy.html
Can you check if there is anything of interest in the server logs of "arlmgtdk02.global.centralorg.net" ? Bind result: Constraint violation(19), Exceed password retry limit
Many Thanks to jhrozek and rmeggins, you had given me some hints arlmgtdk02 is a slave (replicate), arlmgtdk01 is the ldap master (389-ds). sssd.conf shows arlmgtdk01 in front of arlmgtdk02.
The client switches fort and back between the master/slave is not an issue but why it failed on the slave ldap server arlmgtdk02? This evening test show I could get in if I use my previous-previous password (notice previous-previous; not last but last-last)
The slave is configured with "sync on the following days" not "always keep directories in sync" at the Replication Schedule. Maybe this is the issue?
I did a "Initialize Consumer" to re-init the slave and then the login problem is go away without I need to restart the sssd daemon as the other days :-). The slave arlmgtdk02 now has the "right/correct" data received from the Master.
I enable now the "always keep directories in sync" for the Replication (on all of our 3 ldap enviroments) , let see if the problem goes away, otherwise I need help from Rich :-). I will post the result in 4 weeks.
I shutdown the slapd on arlmgtdk02, and then sssd reconnect to arlmgtdk01 again, see the log below sssd_default.log: (Mon Feb 24 21:38:32 2014) [sssd[be[default]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'arlmgtdk01.global.centralorg.net' as 'working'
So the conclusion is: there is no bugs in sssd, but the sync between master-slave somehow "out of sync" when I use "sync on the following days". I test now the othe option "always keep directories in sync"
I have 2 screendumps for the previous-previous password test and the replication Schedule. www.chezmoi.dk/div/sssd2.zip
Again thanks very much for your help. Best regards Tuan
hello again
So the conclusion is: there is no bugs in sssd, but the sync between master-slave somehow "out of sync" when I use "sync on the following days". I test now the other option "always keep directories in sync"
For 2 weeks ago and this week, the sssd issue occurs again (I can login but one of my other collegue can´t) on two other rh6 servers. A manually "Initialize Consumer" on the ldap master (described above) fixed the problem.
So somehow the master-slave sync didn't get all the data from the Master even I used "always keep directories in sync". From the Replication status log, there was no error, It always shows "Replica acquired successfully: Incremental update succeeded". Inside the error and access files I can't see any wrong about the replicate.
BRTuan
Replying to [comment:14 van12]:
hello again So the conclusion is: there is no bugs in sssd, but the sync between master-slave somehow "out of sync" when I use "sync on the following days". I test now the other option "always keep directories in sync" For 2 weeks ago and this week, the sssd issue occurs again (I can login but one of my other collegue can´t) on two other rh6 servers. A manually "Initialize Consumer" on the ldap master (described above) fixed the problem.
I'm assuming you are using the console to trigger the "initialize consumer". The next time this occurs, see if selecting "Send updates now" also resolves the issue.
But, prior to that, once the issue occurs please perform a ldapsearch on that user entry on both replicas(requesting the userpassword attribute) - so we can check for any differences. Then see if "send updates now" helps.
Thanks!
So somehow the master-slave sync didn't get all the data from the Master even I used "always keep directories in sync". From the Replication status log, there was no error, It always shows "Replica acquired successfully: Incremental update succeeded". Inside the error and access files I can't see any wrong about the replicate. Please help BRTuan
I'm assuming you are using the console to trigger the "initialize consumer". The next >time this occurs, see if selecting "Send updates now" also resolves the issue. Didn't help But, prior to that, once the issue occurs please perform a ldapsearch on that user entry >on both replicas(requesting the userpassword attribute) - so we can check for any >differences. Then see if "send updates now" helps.
I'm assuming you are using the console to trigger the "initialize consumer". The next >time this occurs, see if selecting "Send updates now" also resolves the issue. Didn't help
But, prior to that, once the issue occurs please perform a ldapsearch on that user entry >on both replicas(requesting the userpassword attribute) - so we can check for any >differences. Then see if "send updates now" helps.
ldapsearch -xLLL -Z -b dc=nnit "(&(uid=tnng))" userPassword get nothing so I export it to a ldif and get the infos from there
Master: .... nsUniqueId: 640af15e-b84c11e2-b4dec459-5c8956ea uidNumber: 8078 passwordRetryCount: 0 retryCountResetTime: 20140423183539Z passwordExpWarned: 0 passwordExpirationTime: 20140622102611Z passwordGraceUserTime: 0 passwordAllowChangeTime: 20140414102611Z modifyTimestamp: 20140413102611Z modifiersName: cn=server,cn=plugins,cn=config passwordHistory: 20130925110404Z{crypt}gy9YjfDKX//dk passwordHistory: 20131202133021Z{crypt}piOppDJ7Rpie. passwordHistory: 20140208134717Z{crypt}bMpK2.1wgu8MQ passwordHistory: 20140413102611Z{crypt}4FRvw4sn5vdr2 userPassword:: e2NyeXB0fW9pajRheVBVWC5aUm8=
Slave: ... nsUniqueId: 640af15e-b84c11e2-b4dec459-5c8956ea createTimestamp: 20130509020152Z creatorsName: cn=directory manager uidNumber: 8078 uid: tnng objectClass: top objectClass: posixaccount userPassword:: e2NyeXB0fTRGUnZ3NHNuNXZkcjI= passwordHistory: 20130925110404Z{crypt}gy9YjfDKX//dk passwordHistory: 20131202133021Z{crypt}piOppDJ7Rpie. passwordHistory: 20140208134717Z{crypt}bMpK2.1wgu8MQ passwordAllowChangeTime: 20140209134717Z passwordGraceUserTime: 0 modifiersName: cn=directory manager modifyTimestamp: 20140310072215Z retryCountResetTime: 20140425120338Z passwordRetryCount: 3 passwordExpirationTime: 20140419180658Z passwordExpWarned: 1 accountUnlockTime: 20140425122914Z
UPS, the password is not similar.
Notice, I change the password on master 13th of april, but this didn´t get throug to the SLAVE. If this is the case the userPassword on SLAVE (e2NyeXB0fTRGUnZ3NHNuNXZkcjI=) must be the same as this line from MASTER passwordHistory: 20140413102611Z
I will create a cronjob which sync data (using ldif) from master -> slave as a workaround
Thanks Tuan
ps: The master imports a ldif (from another ldap server on another network/Zone) into it every morning. Maybe this confuse the sync/update?? For the work around I will use the same princip on the SLAVE
16 3 * * * /sbin/service dirsrv stop;/usr/lib64/dirsrv/slapd-NNIT/ldif2db -s "dc=nnit" -i /tmp/tmp.ldf;/sbin/service dirsrv start;/sbin/service sssd restart
Forgot to mention, notice the accountUnlockTime on the SLAVE too, "Send updates now" doesn't remove this entry.
nb: it is a MASTER-SLAVE so if I want to reset accountUnlockTime directly on the SLAVE GUI, I will get an error "LDAP is unwilling to perform: cannot update referral" (ok with me, no problem). Just wish the "Send updates now" do it
Replying to [comment:16 van12]:
Any errors in the error log?
Could you give us your current status?
We wonder if the subject "sssd: can't login, need to use the previous password" is still an issue or not. If it is solved, can we close this ticket?
And if you still have some problem to make SSSD work, may we move the discussion to the mailing list or IRC?
Thanks.
The conclusion was written above 2 months ago the slave didn't gotten the same hash password as the master. Happen occasionally
Thank you for your update. This is very interesting...
ps: The master imports a ldif (from another ldap server on another network/Zone) into it every morning. Maybe this confuse the sync/update??
I have a couple of questions. 1) Is the ldif file from another server supposed to be in sync with the master's ldif? I.e., the master and another ldif server are configured to replicate each other? 2) When you export an ldif from another ldap server, what command line or GUI operation do you use? 3) What is the purpose of the export / import?
1) Is the ldif file from another server supposed to be in sync with the master's ldif? I.e., the master and another ldif server are configured to replicate each other? There are 3 separate locations (one central and 2 minor sites) each with 389-ds Master/Slave setup. Neither port 389/636 is opened between locations. Sync happened through ldif file which is coming from the central location, I scp the file every 10 min to the other two sites (only to the Master server of both site) through a "jumper" server.
so there is ONLY one ldif file from the Master at the central location
2) When you export an ldif from another ldap server, what command line or GUI operation do you use? from cronjob on the Master of the central location: /usr/lib64/dirsrv/slapd-NNIT/db2ldif -s "dc=nnit" -a /tmp/tmp.ldf
import on on the MASTER at the other 2 sub sites. NB: There is a replicate agreement for the SLAVE at each location /sbin/service dirsrv stop;/usr/lib64/dirsrv/slapd-NNIT/ldif2db -s "dc=nnit" -i /tmp/tmp.ldf;/sbin/service dirsrv start nb: the SLAVE get "sync" from the MASTER through the replicate agreement I setup from GUI
3) What is the purpose of the export / import? Firewall (see 1) above )
So on the sub sites we experience some of the password on Master and slave doesn't match. No matter how I sync (see my last threads above), I need to do a "initialize" to fix it. AT the moment I have a script to do the initilize once a day.
Thansk /Tuan
Thank you for your detailed description, Tuan.
If I read correctly, the central cluster and 2 minor sites are not replicated each other. And you export the DB on the central master and import it to the 2 minor masters every day.
Your scenario requires consumer initialization on the 2 minor sites once you import the ldif from central master.
Replication manages internal data, which includes RUV entries, tombstone entries, and CSN in each updated attribute. Please see http://directory.fedoraproject.org/wiki/Architecture#Replication for more details. These internal data are shared only in the closed replication topology. That is, you cannot "copy" the internal data from one server to another if they are not configured to replicate each other. Plus, if you export the DB with "db2ldif" without "-r" option, the internal data are not exported to the ldif file. (This is not your case, but if you want to export and import among the servers in the same replication topology, you could use "db2ldif -r ..." to generate an ldif file with the internal data and import it to the other servers. That does not require the consumer initialization after the import.)
In your use case, after this import, you have to initialize consumers. /sbin/service dirsrv stop;/usr/lib64/dirsrv/slapd-NNIT/ldif2db -s"dc=nnit" -i /tmp/tmp.ldf;/sbin/service dirsrv start
The consumer initialization can be done by running ldapmodify on each replication agreement entry as follows:
dn: cn=<YOUR_AGREEMENT>,cn=replica,cn="<YOUR_SUFFIX>",cn=mapping tree,cn=config changetype: modify replace: nsds5beginreplicarefresh nsds5beginreplicarefresh: start
Thanks Noriko, Thanks for the explanation about the replicatiom, I will read the link and look at the "-r" option. I use autoexpect to do a "consumer initialization" every morning in cronjob
By the way how I can flush the cache on a ldap client? is there a command for this? (nscd not running)
SSSD provide sss_cache command for cache cleanup.
You may just run it with -U,--users Invalidate all user records. This option overrides invalidation of specific user if it was also set.
But if there could be some other updates in the database, please use it this option. -E,--everything Invalidate all cached entries except for sudo rules.
thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache?
Replying to [comment:26 van12]:
thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache? Could you tell us which cache you want to flush?
Replying to [comment:27 nhosoi]:
Replying to [comment:26 van12]: thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache? Could you tell us which cache you want to flush? If you are using nss_ldap on RHEL5, there is no cache to be flushed. Do you have any other "client" in your mind?
thanks, but for old client (RH5) which doesn't used sssd, is there a command for flush cache? Could you tell us which cache you want to flush? If you are using nss_ldap on RHEL5, there is no cache to be flushed. Do you have any other "client" in your mind?
let say I do an "id tng" on a ldap client A, i get the 3 groups he is a member of. And then I add "tng" in a 4th group on LDAP server and then do an "id tng" again...and still the output only show 3 groups...If I wait long enough (let say 5min) , "id tng" will show all 4 groups
if I do "id tng" on another ldap client B, it shows correctly at first try (4 groups)
So how I can flush the ldap cache on client A, so "id tng" will show all 4 Groups right away?
Br Tuan
When you write "a ldap client A" and "B", do you mean a host which is a replica (or a consumer) of the master server?
And this use case is that you update a group of "tng" on the server and the change is replicated to the consumer host B immediately, but not to A? It takes some time (e.g., 5 min.) to be replicated to the host A?
Let me clarify a couple of things... 1) This is not related to the export/import issue we talked in the comment 23, 24 since the change is eventually replicated, is it? 2) I assume both replica agreements (to consumer A and B) are configured as nsDS5ReplicaUpdateSchedule: 0000-2359 0123456 See also: https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Core_Server_Configuration_Reference.html#Replication_Attributes_under_cnReplicationAgreementName_cnreplica_cnsuffixName_cnmapping_tree_cnconfig-nsDS5ReplicaUpdateSchedule 3) Replication from master to consumer is not guaranteed to happen at once. The elapsed time in the replication could vary depending upon the host and network status. 5 min. sounds too long, but if the network or server or host is busy, the particular update could be put into the backlog. Do you see such load or traffic?
sorry, I confuse you, please forget what I ask about the flush cache. If you like pls close the case
Replying to [comment:31 van12]:
Don't be sorry... I'm just trying to understand your problem. Do you think it's a replication speed between the server and consumer A? Could there be anything particular which slows down the replication compared to the consumer B?
my last question about flush has nothing to do neither with master, slave/consumer, replication. Just two regular servers using ldap (I call ldap clients A & B).
Replying to [comment:33 van12]:
I see. Thanks. It should be a silly question, but the client A and B points to the same server or different ones such as B pointing to a master while A pointing to a consumer?
If they point to the same server, if you run ldapsearch user "tng" on the client A and B, you get the different memberof values?
Since you noted you are not running NSCD, I cannot find any other caching mechanism in the picture...
Hello Tuan,
Any update on your system? If the original issue is solved by adding the consumer initialization after the import, it might be a good time to move to the mailing list.
If you agree with it, let us know. We are going to close this ticket.
Thank you so much for your help. --noriko
hello Noriko
I surrende, I got another error today, all authentication through the Master failed, needed to shut it Down and let the SLAVE take over (which use 389-ds-base-1.2.11.15-32.el6_5.x86_64)
I don't want to use more time on those 2 new releases anymore, 389-ds-base-1.2.11.29-2.el5 or 389-ds-base-1.2.11.28-1.el5
I get nowhere, only trouble
I will downgrad the master to 389-ds-base-1.2.10.14-2.el5, hard/impossible to find old EL5 packages, any help will appreciate.
Please close this case
thanks Tuan
ps: glad that we have different OS, 389* release on Master and Slave, so the bugs can't take the whole setup down. I will remember this advantage in the future
Tuan,
I'm sorry to hear that... Hopefully, you have a chance to upgrade to the newer version with no problem some time soon.
Metadata Update from @van12: - Issue set to the milestone: N/A
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1026
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Invalid)
Login to comment on this ticket.