#8681 krb5kdc dumped core
Closed: fixed 2 years ago by frenaud. Opened 3 years ago by ysoubeyrand.

Issue

Process 3094 (krb5kdc) of user 0 dumped core.

Stack trace of thread 3094:
#0  0x00007f3ac842b7ff raise (libc.so.6)
#1  0x00007f3ac8415c35 abort (libc.so.6)
#2  0x00007f3ac8415b09 __assert_fail_base.cold.0 (libc.so.6)
#3  0x00007f3ac8423de6 __assert_fail (libc.so.6)
#4  0x00007f3ab9befb52 ldap_get_values_len (libldap_r-2.4.so.2)
#5  0x00007f3abab73273 ipadb_parse_ldap_entry (ipadb.so)
#6  0x00007f3abab7418d dbget_princ (ipadb.so)
#7  0x00007f3abab742a3 ipadb_get_principal (ipadb.so)
#8  0x00007f3aca480ef8 krb5_db_get_principal (libkdb5.so.10)
#9  0x000055abf52fc8c8 process_as_req (krb5kdc)
#10 0x000055abf52fb5c8 dispatch (krb5kdc)
#11 0x000055abf5311125 process_packet (krb5kdc)
#12 0x00007f3ac87ba5ec verto_fire (libverto.so.1)
#13 0x00007f3ab2006ff1 event_process_active_single_queue (libevent-2.1.so.6)
#14 0x00007f3ab2007787 event_base_loop (libevent-2.1.so.6)
#15 0x000055abf52fa2f9 main (krb5kdc)
#16 0x00007f3ac84177b3 __libc_start_main (libc.so.6)
#17 0x000055abf52fa68e _start (krb5kdc)

Steps to Reproduce

I don't know, this happened on one of our servers, without doing something special.

Version/Release/Distribution

Distribution is Red Hat Enterprise Linux release 8.3 (Ootpa).

$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
package freeipa-server is not installed
package freeipa-client is not installed
ipa-server-4.8.7-13.module+el8.3.0+8376+0bba7131.x86_64
ipa-client-4.8.7-13.module+el8.3.0+8376+0bba7131.x86_64
389-ds-base-1.4.3.8-6.module+el8.3.0+8995+c08169ba.x86_64
pki-ca-10.9.4-1.module+el8.3.0+8058+d5cd4219.noarch
krb5-server-1.18.2-5.el8.x86_64

Additional info:

Given the backtrace below and the code, I don't get how we can end up with lcontext != NULL and ipactx->lcontext == NULL in frame 7.

#0  0x00007f3ac842b7ff in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f3ac8415c35 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f3ac8415b09 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007f3ac8423de6 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f3ab9befb52 in ldap_get_values_len (ld=<optimized out>, entry=entry@entry=0x55abf8bad780, target=target@entry=0x7f3abab7e525 "ipatokenRadiusConfigLink") at getvalues.c:97
        ber = {ber_opts = {lbo_valid = 5, lbo_options = 0, lbo_debug = 0}, ber_tag = 0, ber_len = 140729034795568, ber_usertag = 5066068250598200064, ber_buf = 0x55abf5911e10 "\247~\002\f", ber_ptr = 0x7ffe0820ea68 "", 
          ber_end = 0x2 <error: Cannot access memory at address 0x2>, ber_sos_ptr = 0x7f3abab6f991 <ipadb_need_retry+49> "\205\300\017\224\300H\203\304\b\303\017\037D", ber_rwptr = 0x7ffe0820ea68 "", 
          ber_memctx = 0x7f3abab6fb7e <ipadb_simple_search+142>}
        attr = 0x3bad83140 <error: Cannot access memory at address 0x3bad83140>
        vals = 0x0
        __PRETTY_FUNCTION__ = "ldap_get_values_len"
#5  0x00007f3abab73273 in ipadb_validate_radius (ipactx=0x55abf5911e10, ua=0x7ffe0820ea64, lentry=0x55abf8bad780) at ipa_kdb_principals.c:315
        vals = <optimized out>
        vals = <optimized out>
#6  ipadb_get_user_auth (lentry=0x55abf8bad780, ipactx=0x55abf5911e10) at ipa_kdb_principals.c:352
        gua = <optimized out>
        ua = (IPADB_USER_AUTH_PASSWORD | IPADB_USER_AUTH_RADIUS | IPADB_USER_AUTH_OTP)
        gcfg = <optimized out>
        gua = <optimized out>
        ua = <optimized out>
        gcfg = <optimized out>
#7  ipadb_parse_ldap_entry (kcontext=kcontext@entry=0x55abf58f74f0, principal=0x55abf8bb0920 "<redacted>", lentry=0x55abf8bad780, kentry=kentry@entry=0x55abf8bbf728, 
    polmask=polmask@entry=0x7ffe0820ec0c) at ipa_kdb_principals.c:628
        rad_string = "otp\000[{\"indicators\": [\"radius\"]}]"
        otp_string = "otp\000[{\"indicators\": [\"otp\"]}]"
        ipactx = 0x55abf5911e10
        ua = <optimized out>
        lcontext = 0x55abf8ba75a0
        entry = 0x55abf8b3d710
        ied = <optimized out>
        kerr = <optimized out>
        res_tl_data = <optimized out>
        res_key_data = 0x0
        mkvno = 0
        restrlist = 0x55abf596f3f0
        restring = 0x600a3cbf <error: Cannot access memory at address 0x600a3cbf>
        authz_data_list = 0x2f0000001b
        restime = 32766
        resbool = 8
        result = 136374896
        ret = <optimized out>
#8  0x00007f3abab7418d in dbget_princ (kcontext=kcontext@entry=0x55abf58f74f0, ipactx=ipactx@entry=0x55abf5911e10, search_for=search_for@entry=0x55abf8bbf0e0, flags=flags@entry=96, entry=entry@entry=0x55abf8bbf728)
    at ipa_kdb_principals.c:1336
        kerr = 0
        principal = 0x55abf8bb0920 "<redacted>"
        res = 0x55abf8bad780
        lentry = 0x55abf8bad780
        pol = 0
--Type <RET> for more, q to quit, c to continue without paging--
#9  0x00007f3abab742a3 in ipadb_get_principal (kcontext=0x55abf58f74f0, search_for=0x55abf8bbf0e0, flags=96, entry=0x55abf8bbf728) at ipa_kdb_principals.c:1522
        ipactx = 0x55abf5911e10
        kerr = <optimized out>
#10 0x00007f3aca480ef8 in krb5_db_get_principal () from /lib64/libkdb5.so.10
No symbol table info available.
#11 0x000055abf52fc8c8 in lookup_client (req=0x55abf8bb2990, req=0x55abf8bb2990, entry_out=0x55abf8bbf728, flags=96, context=0x55abf58f74f0) at do_as_req.c:148
        pa = <optimized out>
        cert = {magic = 64, length = 32, data = 0x7f3a0000000a <error: Cannot access memory at address 0x7f3a0000000a>}
        pa = <optimized out>
        cert = <optimized out>
#12 process_as_req (request=0x55abf8bb2990, req_pkt=<optimized out>, local_addr=0x55abf8bd08b8, remote_addr=<optimized out>, kdc_active_realm=0x55abf59120d0, vctx=0x55abf58ab430, respond=0x55abf52fb2b0 <finish_dispatch_cache>, 
    arg=0x55abf8b6f910) at do_as_req.c:606
        errcode = <optimized out>
        s_flags = 0
        encoded_req_body = {magic = 205, length = 173, data = 0x55abf8bd0a0b "0\201\252\240\a\003\005"}
        useenctype = <optimized out>
        state = 0x55abf8bbf5d0
        au_state = 0x55abf8b97e10
#13 0x000055abf52fb5c8 in dispatch (cb=0x55abf5519140 <shandle>, local_addr=local_addr@entry=0x55abf8bd08b8, remote_addr=remote_addr@entry=0x55abf8bd0890, pkt=pkt@entry=0x55abf8bd09d8, is_tcp=is_tcp@entry=0, 
    vctx=vctx@entry=0x55abf58ab430, respond=0x55abf530e9a0 <process_packet_response>, arg=0x55abf8bd0860) at dispatch.c:201
        retval = 0
        req = 0x55abf8bb2990
        response = 0x0
        state = 0x55abf8b6f910
        handle = <optimized out>
        kdc_err_context = 0x55abf58f74f0
#14 0x000055abf5311125 in process_packet (ctx=0x55abf58ab430, ev=<optimized out>) at net-server.c:1096
        cc = <optimized out>
        conn = <optimized out>
        state = 0x55abf8bd0860
        __PRETTY_FUNCTION__ = "process_packet"
#15 0x00007f3ac87ba5ec in verto_fire () from /lib64/libverto.so.1
No symbol table info available.
#16 0x00007f3ab2006ff1 in event_process_active_single_queue () from /lib64/libevent-2.1.so.6
No symbol table info available.
#17 0x00007f3ab2007787 in event_base_loop () from /lib64/libevent-2.1.so.6
No symbol table info available.
#18 0x000055abf52fa2f9 in main (argc=5, argv=0x7ffe0820f258) at main.c:1064
        retval = 0
        kcontext = 0x55abf5891610
        realm = <optimized out>
        ctx = 0x55abf58ab430
        tcp_listen_backlog = 5
        i = <optimized out>

Thanks, this is a known issue (https://bugzilla.redhat.com/show_bug.cgi?id=1835741, most comments are private due to customer details). We have not so far been able to figure out why lcontext suddenly becomes NULL.

Thanks for the bugzilla link (I didn't came across it when googling for my bug) and sorry for the noise.

Any ideas on where this might come from are welcomed. ;) It is really a puzzling crash.

I don't have any idea yet (this issue happened only once on one of our 3 servers). Also, I don't get how we can end up in this situation when I read the code. Are there multiple threads acting on the ipactx variable?

I don't think there are multiple threads at all, KDC is single threaded multi-process. @rharwood is that correct?

OK, then I think I'm misunderstanding something. I interpreted from GDB output that lcontext != NULL and ipactx->lcontext == NULL in frame 7. So, my understanding of the code is that we get from https://pagure.io/freeipa/blob/ipa-4-8/f/daemons/ipa-kdb/ipa_kdb_principals.c#_610 straight to https://pagure.io/freeipa/blob/ipa-4-8/f/daemons/ipa-kdb/ipa_kdb_principals.c#_623 (because lcontext != NULL). https://pagure.io/freeipa/blob/ipa-4-8/f/daemons/ipa-kdb/ipa_kdb_principals.c#_623 and https://pagure.io/freeipa/blob/ipa-4-8/f/daemons/ipa-kdb/ipa_kdb_principals.c#_624 do not modify ipactx so how could we end up in https://pagure.io/freeipa/blob/ipa-4-8/f/daemons/ipa-kdb/ipa_kdb_principals.c#_627 with ipactx->lcontext == NULL?

If I was interpreting everything correctly, I would say there is a compiler bug but I really think the bug is in my reasoning instead ;-)

On the other hand, if lcontext == NULL, we'd get to ret = ipadb_get_connection(ipactx); on line 614 and then lcontext would be NULL in line 631. This is not the case for the stacktrace but it would be a bug too.

One thing that might help here is to see the directory server's access logs for the connections from KDB driver around this crash. Say, if we've got down to ipadb_validate_radius(), we must have passed through ipadb_get_global_config() before that -- if that caused to change ipactx->lcontext, we'd see a search for ipaConfigString.

I fear we don't have these logs any moreā€¦

I don't think there are multiple threads at all, KDC is single threaded multi-process. @rharwood is that correct?

The KDC is multiprocess, not multithreaded, yes. But that doesn't mean that there's no concurrency: we use an event library (through libverto) for descheduling.

master:

  • 47d49aa ipa-kdb: do not use OpenLDAP functions with NULL LDAP context
  • 1d2a14a ipa-kdb: fix compiler warnings
  • f996c21 ipa-kdb: add missing prototypes
  • 3037de2 ipa-kdb: reformat ipa_kdb_certauth
  • 3fc75b1 ipa-kdb: mark test functions as static

ipa-4-9:

  • 2832810 ipa-kdb: do not use OpenLDAP functions with NULL LDAP context
  • 0da9de4 ipa-kdb: fix compiler warnings
  • c7ce801 ipa-kdb: add missing prototypes
  • f340baa ipa-kdb: reformat ipa_kdb_certauth
  • 2968609 ipa-kdb: mark test functions as static

Metadata Update from @frenaud:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata Update from @frenaud:
- Custom field affects_doc adjusted to on
- Custom field knownissue adjusted to on
- Issue status updated to: Open (was: Closed)

3 years ago

re-opening as we are still missing the backports to ipa-4-6 and ipa-4-8 branches

ipa-4-8:

  • 701d0fb ipa-kdb: fix compiler warnings
  • d454ca8 ipa-kdb: add missing prototypes
  • da98a6f ipa-kdb: reformat ipa_kdb_certauth
  • aa7f99c ipa-kdb: mark test functions as static
  • 79baa09 ipa-kdb: do not use OpenLDAP functions with NULL LDAP context

ipa-4-6:

  • 972a486 ipa-kdb: fix compiler warnings
  • a244b4f ipa-kdb: add missing prototypes
  • b37797f ipa-kdb: reformat ipa_kdb_certauth
  • ee5c04f ipa-kdb: mark test functions as static
  • 95822c7 ipa-kdb: do not use OpenLDAP functions with NULL LDAP context

Metadata Update from @frenaud:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata