#924 IPA provider segfaults is IPA user does not have a password and no CA cert is available
Closed: Fixed None Opened 7 years ago by sbose.

A freshly created user in IPA typically does not have a password. If you try to log in as this user with a dummy password the IPA provider segfaults.

The IPA provider first try to do a Kerberos authentication which fails, because there is no long term Kerberos password. In this case the IPA provider tries to check if the password migration mode is active on the server by opening a new LDAP connection to read the corresponding attribute. Since this connection is also used to authenticate the user if the migration mode is on, StartTLS is called for this connection, too. It looks that for some reasons ldap_install_tls() does not return immediately with an error code if the CA cert is not available but return LDAP_SUCCESS and let the IPA provider start the kinit operation. During the kinit some kind of unexpected message is received from the LDAP server ("[sdap_process_result] (4): ldap_result gave -1, something bad happend!") and the connection to the LDAP server is closed, but the ongoing SASL authentication does not know about it and tries to access an undefined LDAP handle.

There are a few isssue which needs to be solved in the follwoing way:

  • the migration flag should be read via the LDAP connection of the ID provider, similar as the access provider does it
  • if the migration mode is on a new connection to the LDAP server should be opened only the StartTLS and not with GSSAPI/SASL because GSSAPI/SASL will already bind this connection and another bind with the user creadentiuals might fails.
  • currently TLS is enabled by calling "dp_opt_set_bool(sdap_auth_ctx->opts->basic, SDAP_ID_TLS, true)" which will modify a global option. A better way should be found to enable TLS and disable SASL/GSSAPI for the auth connection.

This issue is also tracked by https://bugzilla.redhat.com/show_bug.cgi?id=721289 a backtrace of the core dump can be found there.

The following log shows what happens before the crash

(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [be_pam_handler] (4): Got request with the following data
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): command: PAM_AUTHENTICATE
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): domain: testrelm
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): user: sbosetest
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): service: sshd
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): tty: ssh
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): ruser: 
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): rhost: localhost
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): authtok type: 1
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): authtok size: 10
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): newauthtok type: 0
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): newauthtok size: 0
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): priv: 0
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [pam_print_data] (4): cli_pid: 26157
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [ldb] (9): tevent: Added timed event "ltdb_callback": 0x18045a0

(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [ldb] (9): tevent: Added timed event "ltdb_timeout": 0x17e1a60

(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [ldb] (9): tevent: Destroying timer event 0x17e1a60 "ltdb_timeout"

(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [ldb] (9): tevent: Ending timer event 0x18045a0 "ltdb_callback"
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [check_if_ccache_file_is_used] (5): User [742600011] is not active
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [check_for_valid_tgt] (1): krb5_cc_retrieve_cred failed.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [krb5_auth_send] (9): Ccache_file is [FILE:/tmp/krb5cc_742600010_grC9Ke] and is not active and TGT is not valid.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_server_status] (7): Status of server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_port_status] (7): Port status of port 0 for server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_server_status] (7): Status of server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [be_resolve_server_done] (4): Found address for server qe-blade-04.testrelm: [10.16.76.35] TTL 7200
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [krb5_find_ccache_step] (9): Recreating  ccache file.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [child_handler_setup] (8): Setting up signal handler up for pid [26160]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [child_handler_setup] (8): Signal handler set up for pid [26160]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [write_pipe_handler] (6): All data has been sent!
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [read_pipe_handler] (6): EOF received, client finished
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [krb5_child_done] (9): child response [17][1][25].
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_server_status] (7): Status of server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_port_status] (7): Port status of port 0 for server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_server_status] (7): Status of server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [child_sig_handler] (7): Waiting for child [26160].
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [child_sig_handler] (4): child [26160] finished successfully.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [be_resolve_server_done] (4): Found address for server qe-blade-04.testrelm: [10.16.76.35] TTL 7200
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sss_ldap_init_send] (9): Using file descriptor [28] for LDAP connection.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_ldap_connect_callback_add] (9): New LDAP connection to [ldap://qe-blade-04.testrelm:389/??base] with fd [28].
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_sys_connect_done] (4): Executing START TLS
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_process_result] (8): Trace: sh[0x17fdcf0], connected[1], ops[0x17f9240], ldap[0x17f8880]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_connect_done] (3): START TLS result: Success(0), Start TLS request accepted.Server willing to negotiate SSL.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_kinit_send] (6): Attempting kinit ((null), host/qe-blade-04.testrelm, TESTRELM, 86400)
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_kinit_next_kdc] (7): Resolving next KDC for service IPA
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_server_status] (7): Status of server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_port_status] (7): Port status of port 0 for server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [get_server_status] (7): Status of server 'qe-blade-04.testrelm' is 'working'
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [be_resolve_server_done] (4): Found address for server qe-blade-04.testrelm: [10.16.76.35] TTL 7200
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_kinit_kdc_resolved] (7): KDC resolved, attempting to get TGT...
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [create_tgt_req_send_buffer] (7): buffer size: 49
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [child_handler_setup] (8): Setting up signal handler up for pid [26161]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [child_handler_setup] (8): Signal handler set up for pid [26161]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [set_tgt_child_timeout] (6): Setting 6 seconds timeout for tgt child
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_process_result] (8): Trace: sh[0x17fdcf0], connected[1], ops[(nil)], ldap[0x17f8880]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_ldap_connect_callback_del] (9): Closing LDAP connection with fd [28].
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_process_result] (4): ldap_result gave -1, something bad happend!
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [sdap_handle_release] (8): Trace: sh[0x17fdcf0], connected[1], ops[(nil)], ldap[0x17f8880], destructor_lock[0], release_memory[0]
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [remove_connection_callback] (9): Successfully removed connection callback.
(Wed Jul 13 14:24:41 2011) [sssd[be[testrelm]]] [write_pipe_handler] (6): All data has been sent!
(Wed Jul 13 14:24:42 2011) [sssd[be[testrelm]]] [read_pipe_handler] (6): EOF received, client finished
(Wed Jul 13 14:24:42 2011) [sssd[be[testrelm]]] [sdap_get_tgt_recv] (6): Child responded: 0 [FILE:/var/lib/sss/db/ccache_TESTRELM], expired on [1310667881]
(Wed Jul 13 14:24:42 2011) [sssd[be[testrelm]]] [sasl_bind_send] (4): Executing sasl bind mech: GSSAPI, user: host/qe-blade-04.testrelm

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.5.12

Fields changed

owner: somebody => jhrozek
priority: major => critical

Fields changed

status: new => assigned

Neither me nor QE were able to reproduce the issue -- we tried latest RHEL, latest Fedora and exactly the same tree QE originally found the issue with.

However, we still think we should proceed with the changes Sumit proposed. Since we are no longer seeing a crash, we are rescheduling this ticket to 1.6

milestone: SSSD 1.5.12 => SSSD 1.6.0
rhbz: =>

Pushing out to 1.7. We plan to release 1.6 soon.

milestone: SSSD 1.6.0 => SSSD 1.7.0

Fields changed

patch: 0 => 1

Fixed in 6d99c0f

resolution: => fixed
status: assigned => closed

Metadata Update from @sbose:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.7.0

2 years ago

Login to comment on this ticket.

Metadata