Learn more about these different git repos.
Other Git URLs
What does not work as expected?[[BR]] After a while (about once in a week) the bind daemon is in the state hang/zombie. The bind daemon seems to be present and accept requests from the clients, but is not answering any dns requests. Only killing the process with kill -9 can stop the daemon. After starting bind again, it works fine, until the problem occurs again.[[BR]]
How can we reproduce the problem?[[BR]] About once in a week at 03:xx am.
Do you see any error messages or suspicious information in logs?[[BR]]
/var/log/messages[[BR]]
Example Day1:[[BR]]
Mar 7 01:06:39 ipa named[5434]: LDAP query timed out. Try to adjust "timeout" parameter[[BR]]
Mar 7 01:06:43 ipa named[5434]: LDAP query timed out. Try to adjust "timeout" parameter[[BR]]
Mar 7 01:06:49 ipa named[5434]: LDAP query timed out. Try to adjust "timeout" parameter[[BR]]
Mar 7 01:06:53 ipa named[5434]: LDAP query timed out. Try to adjust "timeout" parameter[[BR]]
Mar 7 01:06:53 ipa named[5434]: LDAP query timed out. Try to adjust "timeout" parameter[[BR]] [[BR]]
Mar 7 03:21:02 ipa named[5434]: received control channel command 'reload'[[BR]]
Mar 7 03:21:02 ipa named[5434]: loading configuration from '/etc/named.conf'[[BR]]
Mar 7 03:21:02 ipa named[5434]: using default UDP/IPv4 port range: [1024, 65535][[BR]]
Mar 7 03:21:02 ipa named[5434]: using default UDP/IPv6 port range: [1024, 65535][[BR]]
Mar 7 03:21:02 ipa named[5434]: sizing zone task pool based on 6 zones[[BR]]
Mar 7 03:21:02 ipa named[5434]: bind to LDAP server failed: Local error[[BR]]
Mar 7 03:21:02 ipa named[5434]: reloading configuration failed: failure[[BR]]
Mar 7 03:21:02 ipa named[5434]: received SIGHUP signal to reload zones[[BR]]
Mar 7 03:22:23 ipa named[5434]: shutting down[[BR]]
Mar 7 03:22:23 ipa named[5434]: stopping command channel on 127.0.0.1#953[[BR]]
Mar 7 03:22:23 ipa named[5434]: stopping command channel on ::1#953[[BR]]
Mar 7 03:22:23 ipa named[5434]: no longer listening on ::#53[[BR]]
Mar 7 03:22:23 ipa named[5434]: no longer listening on 127.0.0.1#53[[BR]]
Mar 7 03:22:23 ipa named[5434]: no longer listening on xxx#53[[BR]] [[BR]]
Example Day2:[[BR]]
Mar 18 03:33:01 ipa named[24746]: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Ticket expired)[[BR]]
Mar 18 03:33:01 ipa named[24746]: bind to LDAP server failed: Local error[[BR]]
Mar 18 03:33:01 ipa named[24746]: reloading configuration failed: failure[[BR]]
Mar 18 03:33:27 ipa named[24746]: clients-per-query increased to 15[[BR]]
[[BR]] versions in use[[BR]]
OS:[[BR]] centos-release-6-5.el6.centos.11.2.x86_64
Bind:[[BR]] bind-9.8.2-0.17.rc1.el6_4.6.x86_64 bind-dyndb-ldap-2.3-5.el6.x86_64 bind-libs-9.8.2-0.17.rc1.el6_4.6.x86_64 bind-utils-9.8.2-0.17.rc1.el6_4.6.x86_64
IPA:[[BR]]
ipa-server-3.0.0-37.el6.x86_64 ipa-client-3.0.0-37.el6.x86_64[[BR]][[BR]]
Configurations[[BR]]
named.conf:*[[BR]]
dynamic-db "ipa" { library "ldap.so"; arg "uri ldapi://%2fvar%2frun%2fslapd-xxx.socket"; arg "base cn=dns, dc=xxx"; arg "fake_mname xxx."; arg "auth_method sasl"; arg "sasl_mech GSSAPI"; arg "sasl_user DNS/xxx"; arg "cache_ttl 600"; arg "zone_refresh 0"; arg "psearch yes"; arg "serial_autoincrement yes"; arg "connections 4"; };
Hello!
I would like to get details about this problem.
When it happens again, please run following commands on the affected server:
$ rndc trace 20 $ dig @127.0.0.1 some.name.in.your.ldap.zone.example.
It is possible that BIND will not log anything. In any case, please capture coredump from the runnning process:
$ gdb `which named` `pgrep named` (gdb) gcore
Thank you very much!
I have received some coredumps:
(gdb) thread apply all bt Thread 4 (Thread 0x7f7820ba07c0 (LWP 3665)): #0 0x00007f781e40dc94 in do_sigsuspend (set=<value optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63 #1 __sigsuspend (set=<value optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:78 #2 0x00007f781f5b05a4 in isc__app_ctxrun (ctx0=0x7f781f7d5880) at app.c:680 #3 0x00007f7820bde2f5 in main (argc=<value optimized out>, argv=0x7fff903b87a8) at ./main.c:1085 Thread 3 (Thread 0x7f781ba61700 (LWP 3668)): #0 0x00007f781e4c3f43 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f781f5bf8d4 in watcher (uap=0x7f7820b67010) at socket.c:3763 #2 0x00007f781ef61851 in start_thread (arg=0x7f781ba61700) at pthread_create.c:301 #3 0x00007f781e4c394d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 2 (Thread 0x7f781c462700 (LWP 3667)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 #1 0x00007f781f5c188e in isc_condition_waituntil (c=0x7f7820b65078, m=0x7f7820b65028, t=0x7f7820b6506c) at condition.c:59 #2 0x00007f781f5aeb0d in run (uap=0x7f7820b65010) at timer.c:825 #3 0x00007f781ef61851 in start_thread (arg=0x7f781c462700) at pthread_create.c:301 #4 0x00007f781e4c394d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7f781ce63700 (LWP 3666)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f781f5ac1c1 in dispatch (uap=0x7f7820b64010) at task.c:960 #2 run (uap=0x7f7820b64010) at task.c:1157 #3 0x00007f781ef61851 in start_thread (arg=0x7f781ce63700) at pthread_create.c:301 #4 0x00007f781e4c394d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
My best guess for the moment is that internal reference counting doesn't work correctly in some corner cases, probably in some error handler. As a result:
I will dig into it a bit more but this will be very hard to debug without a reproducer. Does it happen every time? Could you provide complete logs to me? (You can sent them privately to pspacek@redhat.com if you don't want to expose logs in the ticket.)
Thanks!
Guys, can you provide me some more information about this problem? I can't move without further information. A reproducer would be the best thing!
Guys, I don't have enough information to fix this so I'm closing this ticket with resolution worksforme. Feel free to reopen it if you get more information about the problem.
worksforme
Have a nice day!
It seems that I have reproduced it by chance. My log now says:
Aug 9 01:01:01 vm-108 named[1430]: GSSAPI Error: The referenced context has expired (Success) Aug 9 01:01:01 vm-108 named[1430]: bind to LDAP server failed: Local error Aug 9 01:01:01 vm-108 named[1430]: reloading configuration failed: failure Aug 9 01:01:01 vm-108 named[1430]: received SIGHUP signal to reload zones
Version: bind-dyndb-ldap-2.3-5.el6.x86_64
bind-dyndb-ldap-2.3-5.el6.x86_64
It seems that logic dealing with creds.times.endtime in krb5_helper.c is broken. Let's fix that.
creds.times.endtime
krb5_helper.c
This problem was fixed in all maintained branches:
Metadata Update from @pspacek: - Issue assigned to pspacek - Issue set to the milestone: Fedora 21
Login to comment on this ticket.