#9041 4.9.6 upgrade failure on Oracle Linux 8.5 (BIND error: PKCS#11 initialization failed)
Opened 2 years ago by marin. Modified 2 years ago

Issue

We run two independent deployments of FreeIPA, each one comprising 2 masters with the same roles (DNS + PKI), for a total of 4 nodes. These servers are running Oracle Linux 8 and have been working flawlessly for 8 to 12 months. Regular updates were performed (IPA upgrade to 4.9.2 ran successfully on 2021-11-12).

This morning, all 4 nodes upgraded to Oracle Linux 8.5, which also implied to upgrade IPA server from 4.9.2 to 4.9.6. Unfortunately, the IPA upgrade script failed in the middle of the migration after BIND refused to come back to life, leaving both deployments in an intermediate state, and unable to perform name resolution anymore.

Since then, trying to start BIND immediately returns with the following terminating error:

initializing DST: PKCS#11 initialization failed

Zone signing was never enabled on any of those nodes. The named user is able to query the HSM and list PKCS#11 tokens:

bash-4.4$ export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf                                                                                                
bash-4.4$ export SOFTHSM2_PIN="$(cat /var/lib/ipa/dnssec/softhsm_pin)"
bash-4.4$ softhsm2-util --show-slots                                                                                                                        
Available slots:
Slot 337811534
    Slot info:
        Description:      SoftHSM slot ID 0x1422984e                                                                                                        
        Manufacturer ID:  SoftHSM project
        Hardware version: 2.6
        Firmware version: 2.6
        Token present:    yes
    Token info:
        Manufacturer ID:  SoftHSM project
        Model:            SoftHSM v2
        Hardware version: 2.6
        Firmware version: 2.6
        Serial number:    d4e1209e1422984e
        Initialized:      yes
        User PIN init.:   yes
        Label:            ipaDNSSEC
Slot 1
    Slot info:
        Description:      SoftHSM slot ID 0x1                                                                                                               
        Manufacturer ID:  SoftHSM project
        Hardware version: 2.6
        Firmware version: 2.6
        Token present:    yes
    Token info:
        Manufacturer ID:  SoftHSM project
        Model:            SoftHSM v2
        Hardware version: 2.6
        Firmware version: 2.6
        Serial number:
        Initialized:      no
        User PIN init.:   no
        Label:
bash-4.4$ pkcs11-list -p "${SOFTHSM2_PIN}" -s "337811534"                                                                                                   
slot 337811534
object[0]: handle 2 class 2 label[45] 'dnssec-replica:ipa-00.core.home.olivarim.com.' id[16] 0xad9083607bf95c06...                                          
object[1]: handle 3 class 3 label[45] 'dnssec-replica:ipa-00.core.home.olivarim.com.' id[16] 0xad9083607bf95c06... E:never

bash-4.4$ pkcs11-tokens 
DEFAULTS
        rand_token=0x55589110f800
        best_rsa_token=0x55589110f800
        best_dsa_token=0x55589110f800
        best_dh_token=0x55589110f800
        digest_token=0x55589110f800
        best_ec_token=0x55589110f800
        best_gost_token=(nil)
        aes_token=0x55589110f800

TOKEN
        address=0x55589110f800
        slotID=337811534
        label=ipaDNSSEC                       
        manufacturerID=SoftHSM project                 
        model=SoftHSM v2      
        serialNumber=d4e1209e1422984e
        supported operations=0x17e (RANDRSA,DSA,DH,DIGEST,EC,AES)

TOKEN
        address=0x55589110f9a0
        slotID=1
        label=                                
        manufacturerID=SoftHSM project                 
        model=SoftHSM v2      
        serialNumber=                
        supported operations=0x17e (RANDRSA,DSA,DH,DIGEST,EC,AES)

However, listing PKCS#11 keys by label does not work; is it expected ?

bash-4.4$ pkcs11-list -p "${SOFTHSM2_PIN}" -l "ipaDNSSEC"
label ipaDNSSEC
Unrecoverable error initializing PKCS#11: not found
Unrecoverable error initializing PKCS#11: not found

Steps to Reproduce

Upgrade IPA server to the latest version (4.9.6) on Oracle Linux 8.

Actual behavior

  • Running ipa-server-upgrade fails at step Add missing CA DNS records
  • The dirsrv@<INSTANCE>, httpd and pki-tomcatd services are running
  • The krb5kdc, kadmin, ipa-custodia and ipa-dnskeysyncd services are disabled (but work if started manually)
  • The named-pkcs11 service fails to start

Expected behavior

  • ipa-server-upgrade should succeed.
  • All services should be enabled and running

Version/Release/Distribution

$ cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="8.5"
<snip>

# uname -r
5.4.17-2136.300.7.el8uek.x86_64

$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
ipa-server-4.9.6-6.0.1.module+el8.5.0+20418+88e16a2c.x86_64
ipa-client-4.9.6-6.0.1.module+el8.5.0+20418+88e16a2c.x86_64
ipa-server-4.9.6-6.0.1.module+el8.5.0+20418+88e16a2c.x86_64
ipa-client-4.9.6-6.0.1.module+el8.5.0+20418+88e16a2c.x86_64
389-ds-base-1.4.3.23-10.module+el8.5.0+20376+fc99395b.x86_64
pki-ca-10.11.2-2.0.1.module+el8.5.0+20411+8eb38ccc.noarch
krb5-server-1.18.2-14.el8.x86_64

# rpm -q bind bind-dyndb-ldap bind-pkcs11
bind-9.11.26-6.el8.x86_64
bind-dyndb-ldap-11.6-2.module+el8.4.0+20088+3d202164.x86_64
bind-pkcs11-9.11.26-6.el8.x86_64

Additional info:

I can provide logs if they are needed. Here are the final lines of /var/log/ipaupgrade.log:

2021-11-17T05:28:31Z DEBUG stderr=
2021-11-17T05:28:31Z DEBUG Start of httpd.service complete
2021-11-17T05:28:31Z INFO [Removing self-signed CA]
2021-11-17T05:28:31Z DEBUG Self-signed CA is not installed
2021-11-17T05:28:31Z INFO [Removing Dogtag 9 CA]
2021-11-17T05:28:31Z DEBUG Dogtag is version 10 or above
2021-11-17T05:28:31Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:31Z DEBUG Loading Index file from '/var/lib/ipa/sysrestore/sysrestore.index'
2021-11-17T05:28:31Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:31Z INFO [Checking for deprecated KDC configuration files]
2021-11-17T05:28:31Z INFO [Checking for deprecated backups of Samba configuration files]
2021-11-17T05:28:31Z DEBUG raw: ca_is_enabled(version='2.242')
2021-11-17T05:28:31Z DEBUG ca_is_enabled(version='2.242')
2021-11-17T05:28:31Z DEBUG flushing ldapi://%2Frun%2Fslapd-HOME-OLIVARIM-COM.socket from SchemaCache
2021-11-17T05:28:31Z DEBUG retrieving schema for SchemaCache url=ldapi://%2Frun%2Fslapd-HOME-OLIVARIM-COM.socket conn=<ldap.ldapobject.SimpleLDAPObject object at 0x7fd7b32c3320>
2021-11-17T05:28:32Z DEBUG raw: kra_is_enabled(version='2.242')
2021-11-17T05:28:32Z DEBUG kra_is_enabled(version='2.242')
2021-11-17T05:28:32Z DEBUG Cleaning up after pkispawn for the CA subsystem
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:32Z DEBUG Loading Index file from '/var/lib/ipa/sysrestore/sysrestore.index'
2021-11-17T05:28:32Z DEBUG Removing /root/.dogtag/pki-tomcat/ca
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:32Z INFO dnssec-validation yes
2021-11-17T05:28:32Z INFO [Add missing CA DNS records]
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z INFO IPA CA DNS records already processed
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Loading StateFile from '/var/lib/ipa/sysupgrade/sysupgrade.state'
2021-11-17T05:28:32Z DEBUG Starting external process
2021-11-17T05:28:32Z DEBUG args=['/bin/systemctl', 'is-active', 'named-pkcs11.service']
2021-11-17T05:28:32Z DEBUG Process finished, return code=3
2021-11-17T05:28:32Z DEBUG stdout=failed

2021-11-17T05:28:32Z DEBUG stderr=
2021-11-17T05:28:32Z DEBUG Starting external process
2021-11-17T05:28:32Z DEBUG args=['/bin/systemctl', 'start', 'named-pkcs11.service']
2021-11-17T05:28:33Z DEBUG Process finished, return code=1
2021-11-17T05:28:33Z DEBUG stdout=
2021-11-17T05:28:33Z DEBUG stderr=Job for named-pkcs11.service failed because the control process exited with error code.
See "systemctl status named-pkcs11.service" and "journalctl -xe" for details.

2021-11-17T05:28:33Z ERROR IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
2021-11-17T05:28:33Z DEBUG   File "/usr/lib/python3.6/site-packages/ipapython/admintool.py", line 180, in execute
    return_value = self.run()
  File "/usr/lib/python3.6/site-packages/ipaserver/install/ipa_server_upgrade.py", line 54, in run
    server.upgrade()
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py", line 1961, in upgrade
    upgrade_configuration()
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py", line 1791, in upgrade_configuration
    upgrade_bind(fstore)
  File "/usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py", line 1453, in upgrade_bind
    bind.start()
  File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 524, in start
    self.service.start(instance_name, capture_output=capture_output, wait=wait)
  File "/usr/lib/python3.6/site-packages/ipaplatform/base/services.py", line 311, in start
    skip_output=not capture_output)
  File "/usr/lib/python3.6/site-packages/ipapython/ipautil.py", line 599, in run
    p.returncode, arg_string, output_log, error_log

2021-11-17T05:28:33Z DEBUG The ipa-server-upgrade command failed, exception: CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'named-pkcs11.service'] returned non-zero exit status 1: 'Job for named-pkcs11.service failed because the control process exited with error code.\nSee "systemctl status named-pkcs11.service" and "journalctl -xe" for details.\n')
2021-11-17T05:28:33Z ERROR Unexpected error - see /var/log/ipaupgrade.log for details:
CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'named-pkcs11.service'] returned non-zero exit status 1: 'Job for named-pkcs11.service failed because the control process exited with error code.\nSee "systemctl status named-pkcs11.service" and "journalctl -xe" for details.\n')
2021-11-17T05:28:33Z ERROR The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information

Hi,

I was able to fix BIND startup by downgrading the bind and bind-* packages from version 9.11.26-6 to 9.11.26-4. Since OL package versions are derivated from RHEL, I suppose this regression may affect RHEL and CentOS. Should it be reported upstream ?

Here's the diff from the SRPM.

@pemensik could you please assess the issue in this ticket given the last comment above?

Thanks for notification.

There were very little changes in RHEL 8.5 to bind package. The only one remotely connected might be bind-9.11-rh1980757.patch, which switches from random() function to proper entropy when doing nsupdate -g. Unless there were drained all entropy, it should not have any difference. I would recommend reverting to just 9.11.26-5 if possible, it contains security fixes which should be unrelated. Not sure if that is possible. There were no change directly touching anything pkcs11 related.

I doubt it could be done by any of that changes when I rechecked what they did. @marin can you please retry upgrade to the 9.11.26-6 and verify just upgrade to this version is responsible and just downgrade of this package fixes it?

pkcs11-list is a bit limited tool. It expects slots start from 0, which is not true on RHEL. If you give it slot id manually, it should list contents correctly. I find p11tool from gnutls-utils much more pleasant to work with. These pkcs11 utilities would not be supported by RHEL9 package.

pkcs11-list -p "${SOFTHSM2_PIN}" -l "ipaDNSSEC" -s 337811534

I would recommend reverting to just 9.11.26-5 if possible

Unfortunately not, as version 9.11.26-5 never landed in the appstream repository.

@marin can you please retry upgrade to the 9.11.26-6 and verify just upgrade to this version is responsible and just downgrade of this package fixes it?

Sure. Here's a transcript of what I have done:

$ sudo dnf list installed | grep ^bind
bind.x86_64                            32:9.11.26-4.el8_4                            @ol8_appstream
bind-dyndb-ldap.x86_64                 11.6-2.module+el8.4.0+20088+3d202164          @ol8_appstream
bind-export-libs.x86_64                32:9.11.26-6.el8                              @ol8_baseos_latest
bind-libs.x86_64                       32:9.11.26-4.el8_4                            @ol8_appstream
bind-libs-lite.x86_64                  32:9.11.26-4.el8_4                            @ol8_appstream
bind-license.noarch                    32:9.11.26-4.el8_4                            @ol8_appstream
bind-pkcs11.x86_64                     32:9.11.26-4.el8_4                            @ol8_appstream
bind-pkcs11-libs.x86_64                32:9.11.26-4.el8_4                            @ol8_appstream
bind-pkcs11-utils.x86_64               32:9.11.26-4.el8_4                            @ol8_appstream
bind-utils.x86_64                      32:9.11.26-4.el8_4                            @ol8_appstream

$ sudo systemctl --no-pager status named-pkcs11 | head -n 3
● named-pkcs11.service - Berkeley Internet Name Domain (DNS) with native PKCS#11
   Loaded: loaded (/usr/lib/systemd/system/named-pkcs11.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-11-18 13:19:32 CET; 5min ago

$ sudo systemctl stop named-pkcs11

$ sudo dnf -yq update bind

$ sudo dnf list installed | grep ^bind
bind.x86_64                            32:9.11.26-6.el8                              @ol8_appstream
bind-dyndb-ldap.x86_64                 11.6-2.module+el8.4.0+20088+3d202164          @ol8_appstream
bind-export-libs.x86_64                32:9.11.26-6.el8                              @ol8_baseos_latest
bind-libs.x86_64                       32:9.11.26-6.el8                              @ol8_appstream
bind-libs-lite.x86_64                  32:9.11.26-6.el8                              @ol8_appstream
bind-license.noarch                    32:9.11.26-6.el8                              @ol8_appstream
bind-pkcs11.x86_64                     32:9.11.26-6.el8                              @ol8_appstream
bind-pkcs11-libs.x86_64                32:9.11.26-6.el8                              @ol8_appstream
bind-pkcs11-utils.x86_64               32:9.11.26-6.el8                              @ol8_appstream
bind-utils.x86_64                      32:9.11.26-6.el8                              @ol8_appstream

$ sudo systemctl start named-pkcs11
Job for named-pkcs11.service failed because the control process exited with error code.
See "systemctl status named-pkcs11.service" and "journalctl -xe" for details.

$ sudo journalctl --no-pager -eu named-pkcs11 | tail -n 27
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: starting BIND 9.11.26-RedHat-9.11.26-6.el8 (Extended Support Version) <id:3ff8620>
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: running on Linux x86_64 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: running as: named-pkcs11 -u named -c /etc/named.conf
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-3.0.1)
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: compiled with libxml2 version: 2.9.7
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: linked to libxml2 version: 20907
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: compiled with libjson-c version: 0.13.1
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: linked to libjson-c version: 0.13.1
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: compiled with zlib version: 1.2.11
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: linked to zlib version: 1.2.11
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: threads support is enabled
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: ----------------------------------------------------
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: BIND 9 is maintained by Internet Systems Consortium,
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: corporation.  Support and training for BIND 9 are
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: available at https://www.isc.org/support
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: ----------------------------------------------------
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: adjusted limit on open files from 262144 to 1048576
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: found 4 CPUs, using 4 worker threads
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: using 3 UDP listeners per interface
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: using up to 21000 sockets
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: initializing DST: PKCS#11 initialization failed
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr named-pkcs11[4683]: exiting (due to fatal error)
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr systemd[1]: named-pkcs11.service: Control process exited, code=exited status=1
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr systemd[1]: named-pkcs11.service: Failed with result 'exit-code'.
nov. 18 13:28:09 ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr systemd[1]: Failed to start Berkeley Internet Name Domain (DNS) with native PKCS#11.

$ sudo dnf -yq downgrade bind

Rétrogradé:
  bind-32:9.11.26-4.el8_4.x86_64              bind-libs-32:9.11.26-4.el8_4.x86_64          bind-libs-lite-32:9.11.26-4.el8_4.x86_64  bind-license-32:9.11.26-4.el8_4.noarch  bind-pkcs11-32:9.11.26-4.el8_4.x86_64
  bind-pkcs11-libs-32:9.11.26-4.el8_4.x86_64  bind-pkcs11-utils-32:9.11.26-4.el8_4.x86_64  bind-utils-32:9.11.26-4.el8_4.x86_64      python3-bind-32:9.11.26-4.el8_4.noarch

$ sudo systemctl start named-pkcs11

$ sudo systemctl --no-pager status named-pkcs11 | head -n 3
● named-pkcs11.service - Berkeley Internet Name Domain (DNS) with native PKCS#11
   Loaded: loaded (/usr/lib/systemd/system/named-pkcs11.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-11-18 13:30:51 CET; 28s ago

pkcs11-list is a bit limited tool. It expects slots start from 0, which is not true on RHEL. If you give it slot id manually, it should list contents correctly.

Yes, it does. AFAIK, slot numbers were made unpredictable in purpose by softHSM developers since version 2, and should not be relied upon. That's why I tried to select the key by label instead.

I find p11tool from gnutls-utils much more pleasant to work with.

Yes, with p11tool, I'm able to select objects by label:

# cat /etc/pkcs11/modules/softhsm2.module
# created by IPA installer
module: /usr/lib64/pkcs11/libsofthsm2.so
disable-in: p11-kit-proxy

# export GNUTLS_PIN="$(cat /var/lib/ipa/dnssec/softhsm_pin)"

# p11tool --list-tokens
Token 0:
        URL: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=System%20Trust
        Label: System Trust
        Type: Trust module
        Flags: uPIN uninitialized
        Manufacturer: PKCS#11 Kit
        Model: p11-kit-trust
        Serial: 1
        Module: p11-kit-trust.so


Token 1:
        URL: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=Default%20Trust
        Label: Default Trust
        Type: Trust module
        Flags: uPIN uninitialized
        Manufacturer: PKCS#11 Kit
        Model: p11-kit-trust
        Serial: 1
        Module: p11-kit-trust.so


Token 2:
        URL: pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=226b9cd63c0d4989;token=ipaDNSSEC
        Label: ipaDNSSEC
        Type: Generic token
        Flags: RNG, Requires login
        Manufacturer: SoftHSM project
        Model: SoftHSM v2
        Serial: 226b9cd63c0d4989
        Module: /usr/lib64/pkcs11/libsofthsm2.so

# p11tool --login --list-all "pkcs11:token=ipaDNSSEC"
Object 0:
        URL: pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=226b9cd63c0d4989;token=ipaDNSSEC;id=%B1%FA%B5%7B%62%3B%F8%46%C2%A8%3A%20%58%18%61%33;object=dnssec-replica%3Aros-ass-ipa-02.core.ass.ros.ipa.pep06.fr.;type=private
        Type: Private key (RSA-2048)
        Label: dnssec-replica:ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr.
        Flags: CKA_WRAP/UNWRAP; CKA_PRIVATE; CKA_NEVER_EXTRACTABLE; CKA_SENSITIVE;
        ID: b1:fa:b5:7b:62:3b:f8:46:c2:a8:3a:20:58:18:61:33

Object 1:
        URL: pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=226b9cd63c0d4989;token=ipaDNSSEC;id=%B1%FA%B5%7B%62%3B%F8%46%C2%A8%3A%20%58%18%61%33;object=dnssec-replica%3Aros-ass-ipa-02.core.ass.ros.ipa.pep06.fr.;type=public
        Type: Public key (RSA-2048)
        Label: dnssec-replica:ros-ass-ipa-02.core.ass.ros.ipa.pep06.fr.
        Flags: CKA_WRAP/UNWRAP; CKA_PRIVATE;
        ID: b1:fa:b5:7b:62:3b:f8:46:c2:a8:3a:20:58:18:61:33

After reading BIND PKCS#11 wiki page, I suppose the named-pkcs11 binary is compiled with native PKCS#11 support, and relies on the softHSM library to interact with the HSM.

Since BIND 9.11.26-6 is a minor upgrade, maybe the regression comes from another interacting component, like softHSM ? During the upgrade to OL 8.5, the softhsm package was upgraded from 2.4.0-4 to 2.6.0-3, then 2.6.0-5. Downgrading softhsm to version 2.6.0-3 does not help (same issue). Unfortunately, I cannot downgrade further to version 2.4.0-4, because it would also imply to downgrade IPA packages. Maybe something weird was introduced between 2.4 and 2.6 ?

Unfortunately debugging of pkcs11 plugin issues is somehow difficult in named. Can you please check data token are writeable to file storage?

Can you please check all files of token remain writeable by named? Some obscure errors may happen if some files are readable but not writeable by the named process. It should be writeable by named user or group. Files on my installation are writeable by named group, but I am not sure if I changed that during some testing on my installation.

ls -lRZ /var/lib/ipa/dnssec/tokens

Yes, named-pkcs11 uses native pkcs11 plugins only for storing key data. It saves all used dns keys inside the storage and softhsm is used as default plugin. Private keys then do not contain private data themselves, but just reference to pkcs11 storage. It is required for opendnssec compatibility. Problem is with debugging when something fails. softhsm plugin just reports failure, but no verbose description what went wrong. Only generic pkcs11 interface is used to talk with the plugin, it does not have to be softhsm always. named in turn logs just generic failure message, not really helping to find what exactly is wrong. I can only see there is an error, but not a cause from journal because of that.

I would try sudo -u named /bin/bash, then try pkcs11 commands again, whether it might change. Easiest way to break pkcs11 is to do any change as root, which named has no way to recover from. If @abbra knows any good way to get more detailed error message from the plugin, I would like to know. I usually ended stepping it via debugger, because messages logged were less than useful.

softhsm version change would be more likely to create some changes, which might break something. But then just downgrade of just bind-pkcs11 package should not fix anything.

Consider using strace to get more details, what is going on systemctl edit named:

[Service]
ExecStart=
ExecStart=strace -o /var/named/data/named.strace /usr/sbin/named -u named -g $OPTIONS

Then with some luck, it might reveal reason for failure of failure during start. I am afraid common debug levels in named would not work. Certainly sudo -u named pkcs11-tokens -v may reveal some problem, but I doubt it. It is the only thing that prints more details during pkcs11 initialization.

I don't know of a better debugging methods. @pemensik is right that file permissions would be the most probably candidate for issues, I've seen that myself.

Going from 2.4 to 2.6 should not cause database issues for softhsm.

The core database changes in OpenDNSSEC were with 1.x to 2.x migration. We migrated to 2.1 branch in RHEL 8.3, so there should be no difference now between 8.4 and 8.5.

Thank you to both of you for your answers!

Can you please check all files of token remain writeable by named? Some obscure errors may happen if some files are readable but not writeable by the named process. It should be writeable by named user or group.

$ sudo ls -lRZ /var/lib/ipa/dnssec/tokens
/var/lib/ipa/dnssec/tokens:
total 0
drwxrws---. 2 ods named unconfined_u:object_r:ipa_var_lib_t:s0 262 26 déc.   2020 56c4804b-7e58-1118-d4e1-209e1422984e

/var/lib/ipa/dnssec/tokens/56c4804b-7e58-1118-d4e1-209e1422984e:
total 16
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0    0 26 déc.   2020 26f8cafe-be8e-31bd-a041-1131c091a96d.lock
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0  980 26 déc.   2020 26f8cafe-be8e-31bd-a041-1131c091a96d.object
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0    0 26 déc.   2020 91a25802-cd0d-e796-fb76-f79be1bf60f3.lock
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0 2273 26 déc.   2020 91a25802-cd0d-e796-fb76-f79be1bf60f3.object
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0    8 18 nov.  18:12 generation
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0    0 18 nov.  18:12 token.lock
-rwxrwx---. 1 ods named unconfined_u:object_r:ipa_var_lib_t:s0  320 18 nov.  18:12 token.object

Honestly, I doubt the issue has to do with file permissions. First because it happened on 4 distinct nodes, which were not deployed at the same time. All nodes were pristine OL8 instances, dedicated to IPA, and we followed official deployment guidelines as closely as possible: I'm 100% sure no file permissions were ever modified. Of course, I could have tweaked permissions by accident on one node, but not on four of them. But even if permissions were involved, I don't understand how they could prevent BIND 9.11.26-6 to start, but not 9.11.26-4.

I would try sudo -u named /bin/bash, then try pkcs11 commands again, whether it might change.

Yes, I had the same idea. I checked this before opening this issue (the transcript is in the first post, with two pkcs11-* commands run as user named).

Certainly sudo -u named pkcs11-tokens -v may reveal some problem, but I doubt it.

Indeed, nothing special here:

$ sudo -u named bash
Mot de passe :
bash-4.4$ export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf
bash-4.4$ export SOFTHSM2_PIN="$(cat /var/lib/ipa/dnssec/softhsm_pin)"
bash-4.4$ pkcs11-tokens -v
slotCount=2
slot#0=0x1422984e
CKM_GOSTR3411: 0x70
CKM_GOSTR3410_KEY_PAIR_GEN: 0x70
CKM_GOSTR3410_WITH_GOSTR3411: 0x70
CKM_EDDSA_KEY_PAIR_GEN: 0x70
slot#1=0x1
CKM_GOSTR3411: 0x70
CKM_GOSTR3410_KEY_PAIR_GEN: 0x70
CKM_GOSTR3410_WITH_GOSTR3411: 0x70
CKM_EDDSA_KEY_PAIR_GEN: 0x70
DEFAULTS
        rand_token=0x559309806800
        best_rsa_token=0x559309806800
        best_dsa_token=0x559309806800
        best_dh_token=0x559309806800
        digest_token=0x559309806800
        best_ec_token=0x559309806800
        best_gost_token=(nil)
        aes_token=0x559309806800

TOKEN
        address=0x559309806800
        slotID=337811534
        label=ipaDNSSEC
        manufacturerID=SoftHSM project
        model=SoftHSM v2
        serialNumber=d4e1209e1422984e
        supported operations=0x17e (RANDRSA,DSA,DH,DIGEST,EC,AES)

TOKEN
        address=0x5593098069a0
        slotID=1
        label=
        manufacturerID=SoftHSM project
        model=SoftHSM v2
        serialNumber=
        supported operations=0x17e (RANDRSA,DSA,DH,DIGEST,EC,AES)

Consider using strace to get more details, what is going on.

I first ran strace without the -f flag. I did not find anything interesting in the output, as the crash seems to happen in the context of a child process whose syscalls were not captured by strace:

named.strace

I then tried to run strace again with the -f flag to monitor child processes and... BIND started normally. I don't really understand what happens. Could it be a race condition occurring in a child process, which would be avoided when strace is running, because its monitoring would be slowing down process execution ? The output of strace -f follows:

named.children.strace

Many thanks, again, for your help.

Hi again,

There are 2 things which I do not understand. I'm very novice with strace, so please forgive my naive thinking.

1. Missing write syscalls

There is something that I don't understand. Should it fail or not, whenever BIND starts up, it logs the following lines to the journal (timestamps removed):

starting BIND 9.11.26-RedHat-9.11.26-6.el8 (Extended Support Version) <id:3ff8620>
running on Linux x86_64 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021
built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
running as: named-pkcs11 -u named
compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-3.0.1)
compiled with libxml2 version: 2.9.7
linked to libxml2 version: 20907
compiled with libjson-c version: 0.13.1
linked to libjson-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
----------------------------------------------------
BIND 9 is maintained by Internet Systems Consortium,
Inc. (ISC), a non-profit 501(c)(3) public-benefit
corporation.  Support and training for BIND 9 are
available at https://www.isc.org/support
----------------------------------------------------
adjusted limit on open files from 262144 to 1048576
found 2 CPUs, using 2 worker threads
using 1 UDP listener per interface
using up to 21000 sockets

The output of strace -f (which makes BIND start normally) shows the write syscalls corresponding to those lines:

write(2, "18-Nov-2021 22:26:24.703 running"..., 116) = 116
write(2, "18-Nov-2021 22:26:24.703 built w"..., 1746) = 1746
write(2, "18-Nov-2021 22:26:24.703 running"..., 74) = 74
write(2, "18-Nov-2021 22:26:24.703 compile"..., 78) = 78
write(2, "18-Nov-2021 22:26:24.703 compile"..., 89) = 89
write(2, "18-Nov-2021 22:26:24.703 linked "..., 85) = 85
write(2, "18-Nov-2021 22:26:24.703 compile"..., 62) = 62
futex(0x7fc1b769fe88, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "18-Nov-2021 22:26:24.703 linked "..., 58) = 58
write(2, "18-Nov-2021 22:26:24.704 compile"..., 65) = 65
write(2, "18-Nov-2021 22:26:24.704 linked "..., 61) = 61
write(2, "18-Nov-2021 22:26:24.704 compile"..., 60) = 60
write(2, "18-Nov-2021 22:26:24.704 linked "..., 56) = 56
write(2, "18-Nov-2021 22:26:24.704 threads"..., 52) = 52
write(2, "18-Nov-2021 22:26:24.704 -------"..., 78) = 78
write(2, "18-Nov-2021 22:26:24.704 BIND 9 "..., 78) = 78
write(2, "18-Nov-2021 22:26:24.704 Inc. (I"..., 76) = 76
write(2, "18-Nov-2021 22:26:24.704 corpora"..., 76) = 76
write(2, "18-Nov-2021 22:26:24.704 availab"..., 66) = 66
write(2, "18-Nov-2021 22:26:24.705 -------"..., 78) = 78
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
prlimit64(0, RLIMIT_DATA, NULL, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
prlimit64(0, RLIMIT_CORE, NULL, {rlim_cur=0, rlim_max=RLIM64_INFINITY}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Opération non permise)
prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
write(2, "18-Nov-2021 22:26:24.705 adjuste"..., 77) = 77
write(2, "18-Nov-2021 22:26:24.705 found 2"..., 62) = 62
write(2, "18-Nov-2021 22:26:24.705 using 1"..., 60) = 60

They appear before the main process is cloned/forked (the first clone syscall only appears 3 lines after).

As a confirmation, running strace -ff produces 6 per-process output files named in ascending PID order:

named.strace.5872  named.strace.5873  named.strace.5874  named.strace.5875  named.strace.5876  named.strace.5879

Since the write syscalls were logged to the first file (named.strace.5872), I suppose it confirms they were run in the context of the main/parent process.

Now, when invoking strace without the -f flag (which makes BIND crash as expected), the same informative lines are logged to the system journal, plus 2 more error lines. Yet, there is not a single write syscall in strace output. According to strace, the process is terminated before those lines are ever printed - and yet they are.

2. read on closed FD

In the named.strace transcript, the crash happens immediately after the process tried to read file descriptor 7, which was _close_d a few lines up:

openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 7
read(7, "0-1\n", 8192)                  = 4
close(7)                                = 0
prctl(PR_SET_KEEPCAPS, 1)               = 0
getuid()                                = 0
setgid(25)                              = 0
setuid(25)                              = 0
prctl(PR_SET_DUMPABLE, SUID_DUMP_USER)  = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_CHOWN|1<<CAP_DAC_READ_SEARCH|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_NET_BIND_SERVICE|1<<CAP_SYS_CHROOT|1<<CAP_SYS_RESOURCE, inheritable=0}) = 0
getuid()                                = 25
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_BIND_SERVICE|1<<CAP_SYS_RESOURCE, permitted=1<<CAP_NET_BIND_SERVICE|1<<CAP_SYS_RESOURCE, inheritable=0}) = 0
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3aa45c000
pipe([7, 8])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc3aa49df50) = 4162
close(8)                                = 0
read(7, "", 1)                          = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4162, si_uid=25, si_status=1, si_utime=0, si_stime=0} ---
exit_group(1)                           = ?
+++ exited with 1 +++

I suppose this is the cause of the crash, but do not understand why it does not happen when strace is run in multi-process mode.

Could you advise on what to do next ? Should I report this to Oracle or the BIND developers ?

Thanks!

It night help adding -f or better -g to named options. It would not fork and daemonize, which would write all logs to stderr. journalctl would contain them anyway, even when not written to syslog socket.

It might be easier to trace it that way.

it called pipe([7, 8]) before. I think that means new pipe were opened with handles 7 and 8. It is no longer plain file, it was closed. Reporting to Oracle is a good idea, since you are using their distribution. Because they seem to use my commits from RHEL, I doubt they would be able to fix it.

Upstream would fix issues in BIND 9.11 only if they were critical. I am not sure this would be considered that way. But by reporting to them, the worst can happen they would close it.

But It might be just RHEL specific issue. When it passes under strace, that is quite strange. AFAIK it should be done from main thread when other threads do not yet work on queries.

There is a neat utility pkcs11-spy.so, which could help us here. If it is pkcs11 related, details might help. Can you please follow this one:

# do as root, sudo -i for example
dnf install -y opensc
# emulate running under systemd
export KRB5_KTNAME=/etc/named.keytab NAMEDCONF=/etc/named.conf
export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf
cd /var/named
PKCS11_PROVIDER=/usr/lib64/pkcs11/pkcs11-spy.so \
PKCS11SPY=/usr/lib64/pkcs11/libsofthsm2.so \
/usr/sbin/named-pkcs11 -u named -g 2>&1 | tee /tmp/named-pkcs11.log

Would this report something interesting related to pkcs11 failure? My testing instance is broken right now, but passes this part just fine.

Hi Petr,

Apparently, BIND has an issue with finding the PKCS#11 provider library. I did as you suggested, and I got zero debug output from pkcs11-spy. BIND just fails as usual, which might mean that it is in fact unable to load the provider in the first place.

$ sudo -u named bash
bash-4.4$ ls -laZ /usr/lib64/pkcs11/pkcs11-spy.so /usr/lib64/pkcs11/libsofthsm2.so /usr/lib64/pkcs11-spy.so 
-rwxr-xr-x. 1 root root system_u:object_r:lib_t:s0 934152 19 mai    2021 /usr/lib64/pkcs11/libsofthsm2.so
lrwxrwxrwx. 1 root root system_u:object_r:lib_t:s0     16  7 avril  2021 /usr/lib64/pkcs11/pkcs11-spy.so -> ../pkcs11-spy.so
-rwxr-xr-x. 1 root root system_u:object_r:lib_t:s0 106976  7 avril  2021 /usr/lib64/pkcs11-spy.so
bash-4.4$ export KRB5_KTNAME=/etc/named.keytab NAMEDCONF=/etc/named.conf
bash-4.4$ export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf
bash-4.4$ cd /var/named
bash-4.4$ PKCS11_PROVIDER=/usr/lib64/pkcs11/pkcs11-spy.so \
> PKCS11SPY=/usr/lib64/pkcs11/libsofthsm2.so \
> /usr/sbin/named-pkcs11 -u named -g 2>&1 | tee /tmp/named-pkcs11.log
20-Nov-2021 17:46:12.745 starting BIND 9.11.26-RedHat-9.11.26-6.el8 (Extended Support Version) <id:3ff8620>
20-Nov-2021 17:46:12.745 running on Linux x86_64 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021
20-Nov-2021 17:46:12.745 built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
20-Nov-2021 17:46:12.745 running as: named-pkcs11 -u named -g
20-Nov-2021 17:46:12.745 compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-3.0.1)
20-Nov-2021 17:46:12.745 compiled with libxml2 version: 2.9.7
20-Nov-2021 17:46:12.745 linked to libxml2 version: 20907
20-Nov-2021 17:46:12.745 compiled with libjson-c version: 0.13.1
20-Nov-2021 17:46:12.745 linked to libjson-c version: 0.13.1
20-Nov-2021 17:46:12.745 compiled with zlib version: 1.2.11
20-Nov-2021 17:46:12.745 linked to zlib version: 1.2.11
20-Nov-2021 17:46:12.745 threads support is enabled
20-Nov-2021 17:46:12.745 ----------------------------------------------------
20-Nov-2021 17:46:12.745 BIND 9 is maintained by Internet Systems Consortium,
20-Nov-2021 17:46:12.745 Inc. (ISC), a non-profit 501(c)(3) public-benefit 
20-Nov-2021 17:46:12.745 corporation.  Support and training for BIND 9 are 
20-Nov-2021 17:46:12.745 available at https://www.isc.org/support
20-Nov-2021 17:46:12.745 ----------------------------------------------------
20-Nov-2021 17:46:12.745 found 2 CPUs, using 2 worker threads
20-Nov-2021 17:46:12.745 using 1 UDP listener per interface
20-Nov-2021 17:46:12.746 using up to 21000 sockets
20-Nov-2021 17:46:12.746 initializing DST: PKCS#11 initialization failed
20-Nov-2021 17:46:12.746 exiting (due to fatal error)

While reluctant to it, I also ran BIND as root, but got exactly the same result.

Consider using export for those variables too. I used them on single line without backslashes. It seems it ignores PKCS11_PROVIDER variable at all. Parameter -E can be used instead it, but PKCS11_PROVIDER environment variable should have higher priority. It has to work.

Running as root with -u named is okay. It starts as root, but changes to named user while keeping just limited set of rights. For example for listening on port 53, which would not be possible when running under named user without this capability added.

$ export KRB5_KTNAME=/etc/named.keytab NAMEDCONF=/etc/named.conf
$ export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf
$ export PKCS11SPY=/usr/lib64/pkcs11/libsofthsm2.so
$ /usr/sbin/named-pkcs11 -u named -g -E /usr/lib64/pkcs11-spy.so 2>&1 | tee /tmp/named-pkcs11.log

If there is something wrong with dlopen interface for loading plugins runtime, I am not sure which part might be responsible. named-pkcs11 cannot work without it. Is it possible some selinux policy or rule blocks it on Oracle Linux? I would recommend asking Oracle for help in this case.

I could recommend only installing gdb and breaking on pk11_initialize function. set args -u named -g, then run. dnf debuginfo-install bind bind-pkcs11 bind-pkcs11-libs bind-libs bind-libs-lite should help you to have debug symbols installed. There seems to be something quite wrong. But because you do not use even CentOS, I am not sure how to fix something similar or whether debug packages are even provided.

Have you tried setting setenforce 0 before testing it? That might even answer why it works under strace, but did not start directly. It has different selinux contexts assigned. But when started outside systemd it should not matter.

Have you tried setting setenforce 0 before testing it?

Yes, I've tried that before opening this issue and It did not help.

Consider using export for those variables too.

I've tried too; still no change, even with SELinux in permissive mode:

$ sudo -u root bash
# cd /var/named/data/
# setenforce 0
# export KRB5_KTNAME=/etc/named.keytab NAMEDCONF=/etc/named.conf                                                                          
# export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf                                                                                      
# export PKCS11SPY=/usr/lib64/pkcs11/libsofthsm2.so                                                                                       
# export PKCS11_PROVIDER=/usr/lib64/pkcs11-spy.so
# /usr/sbin/named-pkcs11 -u named -g 2>&1 | tee /tmp/named-pkcs11.log                                                                     
20-Nov-2021 18:27:07.566 starting BIND 9.11.26-RedHat-9.11.26-6.el8 (Extended Support Version) <id:3ff8620>                                                 
20-Nov-2021 18:27:07.566 running on Linux x86_64 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021                                         
20-Nov-2021 18:27:07.566 built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
20-Nov-2021 18:27:07.566 running as: named-pkcs11 -u named -g
20-Nov-2021 18:27:07.566 compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-3.0.1)
20-Nov-2021 18:27:07.566 compiled with libxml2 version: 2.9.7
20-Nov-2021 18:27:07.566 linked to libxml2 version: 20907
20-Nov-2021 18:27:07.566 compiled with libjson-c version: 0.13.1
20-Nov-2021 18:27:07.566 linked to libjson-c version: 0.13.1
20-Nov-2021 18:27:07.566 compiled with zlib version: 1.2.11
20-Nov-2021 18:27:07.567 linked to zlib version: 1.2.11
20-Nov-2021 18:27:07.567 threads support is enabled
20-Nov-2021 18:27:07.567 ----------------------------------------------------
20-Nov-2021 18:27:07.567 BIND 9 is maintained by Internet Systems Consortium,
20-Nov-2021 18:27:07.567 Inc. (ISC), a non-profit 501(c)(3) public-benefit
20-Nov-2021 18:27:07.567 corporation.  Support and training for BIND 9 are
20-Nov-2021 18:27:07.567 available at https://www.isc.org/support
20-Nov-2021 18:27:07.567 ----------------------------------------------------
20-Nov-2021 18:27:07.567 adjusted limit on open files from 262144 to 1048576
20-Nov-2021 18:27:07.567 found 2 CPUs, using 2 worker threads
20-Nov-2021 18:27:07.567 using 1 UDP listener per interface
20-Nov-2021 18:27:07.567 using up to 21000 sockets
20-Nov-2021 18:27:07.567 initializing DST: PKCS#11 initialization failed
20-Nov-2021 18:27:07.567 exiting (due to fatal error)

It seems it ignores PKCS11_PROVIDER variable at all. Parameter -E can be used instead it, but PKCS11_PROVIDER environment variable should have higher priority. It has to work.

I tested it with both BIND versions, and the PKCS11_PROVIDER environment variable seems to be ignored by both:

  • BIND 9.11.26-4 starts correctly without PKCS11_PROVIDER nor -E
  • BIND 9.11.26-6 only starts when -E is specified

Here is BIND 9.11.26-6 starting correctly with SELinux in enforcing mode and an explicit path:

$ sudo -u root bash
# cd /var/named/data/
# setenforce 1
# export KRB5_KTNAME=/etc/named.keytab NAMEDCONF=/etc/named.conf                                                                          
# export SOFTHSM2_CONF=/etc/ipa/dnssec/softhsm2.conf                                                                                      
# /usr/sbin/named-pkcs11 -u named -g -E /usr/lib64/pkcs11/libsofthsm2.so 2>&1 | tee /tmp/named-pkcs11.log    

I tested the same commands with BIND 9.11.26-4, and it just starts fine with PKCS11_PROVIDER not set, and -E unspecified. Since I see no mention of PKCS11_PROVIDER and -E in the systemd unit file, and BIND -4 is able to start without them, I suppose the named-pkcs11 binary includes a default, literal path to the library, and is expected to rely on it.

The patches included in 9.11.26-6 seem unrelated to this issue. Yes something must have changed somewhere!

According to the named(8) man page:

-E engine-name

When applicable, specifies the hardware to use for cryptographic operations, such as a secure key store used for signing. When BIND is built with OpenSSL PKCS#11 support, this defaults to the string "pkcs11", which identifies an OpenSSL engine that can drive a cryptographic accelerator or hardware service module. When BIND is built with native PKCS#11 cryptography (--enable-native-pkcs11), it defaults to the path of the PKCS#11 provider library specified via "--with-pkcs11".

BIND's build flags (printed at startup) are exactly the same for both versions and do specify the correct path to the library:

'--build=x86_64-redhat-linux-gnu'
'--host=x86_64-redhat-linux-gnu'
'--program-prefix='
'--disable-dependency-tracking'
'--prefix=/usr'
'--exec-prefix=/usr'
'--bindir=/usr/bin'
'--sbindir=/usr/sbin'
'--sysconfdir=/etc'
'--datadir=/usr/share'
'--includedir=/usr/include'
'--libdir=/usr/lib64'
'--libexecdir=/usr/libexec'
'--sharedstatedir=/var/lib'
'--mandir=/usr/share/man'
'--infodir=/usr/share/info'
'--with-python=/usr/libexec/platform-python'
'--with-libtool'
'--localstatedir=/var'
'--enable-threads'
'--enable-ipv6'
'--enable-filter-aaaa'
'--with-pic'
'--disable-static'
'--includedir=/usr/include/bind9'
'--with-tuning=large'
'--with-libidn2'
'--enable-openssl-hash'
'--with-geoip2'
-----------------------------------------------------
'--enable-native-pkcs11'
'--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so'
-----------------------------------------------------
'--with-dlopen=yes'
'--with-dlz-ldap=yes'
'--with-dlz-postgres=yes'
'--with-dlz-mysql=yes'
'--with-dlz-filesystem=yes'
'--with-dlz-bdb=yes'
'--with-gssapi=yes'
'--disable-isc-spnego'
'--with-lmdb=no'
'--with-libjson'
'--enable-dnstap'
'--with-cmocka'
'--enable-fixed-rrset'
'--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets'
'--enable-full-report'
'build_alias=x86_64-redhat-linux-gnu'
'host_alias=x86_64-redhat-linux-gnu'
'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld'
'CPPFLAGS= -DDIG_SIGCHASE'
'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'

Also, specifying only -E libsofthsm2.so (without the full path) will also make BIND start correctly.

For a moment I thought the Oracle UEK (based on 5.4.14) could be a potential culprit. I can now confirm that it is not, since the issue persists after booting the standard RHEL kernel (4.18.0-348).

I ran named-pkcs11 with gdb, as you suggested. I can confirm that BIND 9.11.26-6 uses pkcs11 as the default PKCS#11 engine even when the -E argument is not set. You can see this by breaking on the dst_lib_init2() function and examining its third argument (const char *engine):

(gdb) run                                                                                                                                                   
Starting program: /usr/sbin/named-pkcs11 -u named -g
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
21-Nov-2021 14:41:18.178 starting BIND 9.11.26-RedHat-9.11.26-6.el8 (Extended Support Version) <id:3ff8620>                                                 
21-Nov-2021 14:41:18.178 running on Linux x86_64 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021                                         
21-Nov-2021 14:41:18.178 built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
21-Nov-2021 14:41:18.178 running as: named-pkcs11 -u named -g
21-Nov-2021 14:41:18.178 compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-3.0.1)
21-Nov-2021 14:41:18.178 compiled with libxml2 version: 2.9.7
21-Nov-2021 14:41:18.178 linked to libxml2 version: 20907
21-Nov-2021 14:41:18.178 compiled with libjson-c version: 0.13.1
21-Nov-2021 14:41:18.178 linked to libjson-c version: 0.13.1
21-Nov-2021 14:41:18.178 compiled with zlib version: 1.2.11
21-Nov-2021 14:41:18.178 linked to zlib version: 1.2.11
21-Nov-2021 14:41:18.178 threads support is enabled
21-Nov-2021 14:41:18.178 ----------------------------------------------------
21-Nov-2021 14:41:18.178 BIND 9 is maintained by Internet Systems Consortium,
21-Nov-2021 14:41:18.178 Inc. (ISC), a non-profit 501(c)(3) public-benefit
21-Nov-2021 14:41:18.178 corporation.  Support and training for BIND 9 are
21-Nov-2021 14:41:18.178 available at https://www.isc.org/support
21-Nov-2021 14:41:18.178 ----------------------------------------------------
21-Nov-2021 14:41:18.178 adjusted limit on open files from 262144 to 1048576
21-Nov-2021 14:41:18.178 found 2 CPUs, using 2 worker threads
21-Nov-2021 14:41:18.178 using 1 UDP listener per interface
[New Thread 0x7ffff0bb7700 (LWP 3976)]
[New Thread 0x7ffff03b6700 (LWP 3977)]
[New Thread 0x7fffefbb5700 (LWP 3978)]
[New Thread 0x7fffef3b4700 (LWP 3979)]
21-Nov-2021 14:41:18.180 using up to 21000 sockets

Thread 1 "named-pkcs11" hit Breakpoint 2, dst_lib_init2 (mctx=0x5555558297d0, ectx=0x7ffff7fa8010, engine=0x5555555f9492 "pkcs11", eflags=eflags@entry=1)   
    at ../../../lib/dns-pkcs11/dst_api.c:155
155     ../../../lib/dns-pkcs11/dst_api.c: Aucun fichier ou dossier de ce type.            

(gdb) backtrace
#0  dst_lib_init2 (mctx=0x5555558297d0, ectx=0x7ffff7fa8010, engine=0x5555555f9492 "pkcs11", eflags=eflags@entry=1) at ../../../lib/dns-pkcs11/dst_api.c:155
#1  0x00005555555b1f54 in ns_server_create (mctx=0x5555558297d0, serverp=0x555555823238 <ns_g_server>) at ../../../bin/named-pkcs11/server.c:9157           
#2  0x0000555555578b8e in setup () at ../../../bin/named-pkcs11/main.c:1337
#3  main (argc=4, argv=0x7fffffffe4c8) at ../../../bin/named-pkcs11/main.c:1556             

(gdb) continue
Continuing.

Thread 1 "named-pkcs11" hit Breakpoint 1, pk11_initialize (mctx=mctx@entry=0x5555558297d0, engine=engine@entry=0x5555555f9492 "pkcs11")
    at ../../../lib/isc-pkcs11/pk11.c:187
187     ../../../lib/isc-pkcs11/pk11.c: Aucun fichier ou dossier de ce type.

(gdb) backtrace
#0  pk11_initialize (mctx=mctx@entry=0x5555558297d0, engine=engine@entry=0x5555555f9492 "pkcs11") at ../../../lib/isc-pkcs11/pk11.c:187
#1  0x00007ffff792f5b2 in dst_lib_init2 (mctx=0x5555558297d0, ectx=0x7ffff7fa8010, engine=0x5555555f9492 "pkcs11", eflags=eflags@entry=1)
    at ../../../lib/dns-pkcs11/dst_api.c:200
#2  0x00005555555b1f54 in ns_server_create (mctx=0x5555558297d0, serverp=0x555555823238 <ns_g_server>) at ../../../bin/named-pkcs11/server.c:9157
#3  0x0000555555578b8e in setup () at ../../../bin/named-pkcs11/main.c:1337
#4  main (argc=4, argv=0x7fffffffe4c8) at ../../../bin/named-pkcs11/main.c:1556

(gdb) continue
Continuing.
21-Nov-2021 14:53:15.346 initializing DST: PKCS#11 initialization failed
21-Nov-2021 14:53:15.346 exiting (due to fatal error)
[Thread 0x7fffefbb5700 (LWP 3978) exited]
[Thread 0x7ffff03b6700 (LWP 3977) exited]
[Thread 0x7ffff0bb7700 (LWP 3976) exited]
[Thread 0x7ffff7fe1c80 (LWP 3975) exited]
[Inferior 1 (process 3975) exited with code 01]

Yet, with BIND 9.11.26-4, the very same argument defaults to null:

(gdb) run
Starting program: /usr/sbin/named-pkcs11 -u named -g
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-164.0.1.el8.x86_64                                                                       
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
21-Nov-2021 15:32:09.928 starting BIND 9.11.26-RedHat-9.11.26-4.el8_4 (Extended Support Version) <id:3ff8620>                                               
21-Nov-2021 15:32:09.928 running on Linux x86_64 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021                                         
21-Nov-2021 15:32:09.928 built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
21-Nov-2021 15:32:09.928 running as: named-pkcs11 -u named -g
21-Nov-2021 15:32:09.928 compiled by GCC 8.4.1 20200928 (Red Hat 8.4.1-1.0.1)
21-Nov-2021 15:32:09.928 compiled with libxml2 version: 2.9.7
21-Nov-2021 15:32:09.928 linked to libxml2 version: 20907
21-Nov-2021 15:32:09.928 compiled with libjson-c version: 0.13.1
21-Nov-2021 15:32:09.928 linked to libjson-c version: 0.13.1
21-Nov-2021 15:32:09.928 compiled with zlib version: 1.2.11
21-Nov-2021 15:32:09.928 linked to zlib version: 1.2.11
21-Nov-2021 15:32:09.928 threads support is enabled
21-Nov-2021 15:32:09.928 ----------------------------------------------------
21-Nov-2021 15:32:09.928 BIND 9 is maintained by Internet Systems Consortium,
21-Nov-2021 15:32:09.928 Inc. (ISC), a non-profit 501(c)(3) public-benefit
21-Nov-2021 15:32:09.928 corporation.  Support and training for BIND 9 are
21-Nov-2021 15:32:09.928 available at https://www.isc.org/support
21-Nov-2021 15:32:09.928 ----------------------------------------------------
21-Nov-2021 15:32:09.928 adjusted limit on open files from 262144 to 1048576
21-Nov-2021 15:32:09.928 found 2 CPUs, using 2 worker threads
21-Nov-2021 15:32:09.928 using 1 UDP listener per interface
[New Thread 0x7ffff0bb7700 (LWP 4871)]
[New Thread 0x7ffff03b6700 (LWP 4872)]
[New Thread 0x7fffefbb5700 (LWP 4873)]
[New Thread 0x7fffef3b4700 (LWP 4874)]
21-Nov-2021 15:32:09.930 using up to 21000 sockets

Thread 1 "named-pkcs11" hit Breakpoint 1, dst_lib_init2 (mctx=0x5555558297d0, ectx=0x7ffff7fa8010, engine=0x0, eflags=eflags@entry=1)

dst_lib_init2() is called by ns_server_create() to initialize the DST. The * engine parameter is set to a reference to const char * ns_g_engine, which is defined earlier in the code.

The value of ns_g_engine may have 2 origins:

In the configure script, PKCS11_ENGINE may only be assigned two values: pkcs11 or NULL. There is no such value as libshofthsm2.so or anything else.
pkcs11 seems to be the engine to use when the native PKCS#11 interface is enabled. But if it so, I don't understand why the previous version of BIND (and probably older ones) used the null engine instead.

The PKCS11_PROVIDER environment variable, which has no effect at runtime, is present in the configure script to help selecting a PKCS#11 flavour. But I have no idea about how it affects the compilation in the end.

I'm afraid this issue is far too complex to deal with for someone like me, who is not a developer and do not know much about BIND intricacies. I think I'm just going to override the unit files and pass an explicit reference to the PKCS#11 engine to use. If you think this should be reported somewhere, I'd be glad to do it. Otherwise, since I'm apparently the only one to experiment those startup failures, I think we may close the ticket.

Hello,

I've got the same problem.

After an update from Oracle Linux 8.4 to Oracle Linux 8.5 and updating from ipa-server 4.9.2 to 4.9.6, the ipa upgrade won't work.

The problem was the same as Marin described, the named package won't start.
I'm also sure that i did not modify any file permission during the IPA deployment or after that.

I tried several things, including disable SELinux or modifying /etc/named.conf configuration, but I did not manage to start the named-pkcs11 service. I agree with Marin, the issue seems to be complex.

Marin, did you finally manage to start the service without downgrading to the 9.11.26-4 version?

Regards.

I've got the same problem.

Thanks for reporting!

Marin, did you finally manage to start the service without downgrading to the 9.11.26-4 version?

Yes. You need to pass the name of the PKCS#11 provider library (libsofthsm2.so) to BIND explicitly with a command-line argument. Just override the named-pkcs11 systemd unit file:

$ sudo systemctl edit named-pkcs11

Then paste the following lines, which add the -E argument to the command responsible for starting the service :

[Service]
ExecStart=
ExecStart=/usr/sbin/named-pkcs11 -u named -E libsofthsm2.so -c ${NAMEDCONF} $OPTIONS

Then update BIND to the latest version and restart the service; it should be OK.

Strange, ns_g_engine should have defaults to NULL on native bind-pkcs11 build. It should be set to pkcs11 only when openssl should be used. Which is not the case on RHEL8. Because this value overrides default lib_name variable on line pk11.c:206, after PKCS11_PROVIDER is applied. It overrides anything set by environment. But -E parameter would change this default, if you set it to -E libsofthsm2.so, it should receive correct value.

I still hadn't found how it got changed since 8.4, because I could not find any relevant change yet. I can reproduce that issue on my Fedora 34 local build. But it seems RHEL8 build behaves differently, it has ns_g_engine set to NULL on start. It might be because different gcc compiler or slightly different settings is used on Oracle Linux.

I think this diff would fix it:

diff --git a/bind.spec b/bind.spec
index e19492f..c2bfeac 100644
--- a/bind.spec
+++ b/bind.spec
@@ -565,13 +565,13 @@ find bin lib/lwres/man -name '*.docbook' -exec \
       -i '{}' ';'

 %if %{with PKCS11}
+%patch150 -p1 -b .engine-pkcs11
 cp -r bin/named{,-pkcs11}
 cp -r bin/dnssec{,-pkcs11}
 cp -r lib/isc{,-pkcs11}
 cp -r lib/dns{,-pkcs11}
 %patch136 -p1 -b .dist_pkcs11
 %patch149 -p1 -b .kyua-pkcs11
-%patch150 -p1 -b .engine-pkcs11
 %endif

 %if %{with SDB}

For some reason it works well on RHEL8 build with the same source.

# rpm -q bind-pkcs11
bind-pkcs11-9.11.26-6.el8.x86_64
# gdb /usr/sbin/named-pkcs11
# set args -u named -g
# break pk11_initialize
# run
Thread 1 "named-pkcs11" hit Breakpoint 1, pk11_initialize (mctx=mctx@entry=0x5555558297d0, engine=engine@entry=0x0)
    at ../../../lib/isc-pkcs11/pk11.c:187
187 pk11_initialize(isc_mem_t *mctx, const char *engine) {
(gdb) bt      
#0  pk11_initialize (mctx=mctx@entry=0x5555558297d0, engine=engine@entry=0x0) at ../../../lib/isc-pkcs11/pk11.c:187
#1  0x00007ffff792f5b2 in dst_lib_init2 (mctx=0x5555558297d0, ectx=0x7ffff7fa8010, engine=0x0, eflags=eflags@entry=1)
    at ../../../lib/dns-pkcs11/dst_api.c:200
#2  0x00005555555b1f34 in ns_server_create (mctx=0x5555558297d0, serverp=0x555555823240 <ns_g_server>)
    at ../../../bin/named-pkcs11/server.c:9157
#3  0x0000555555578b6e in setup () at ../../../bin/named-pkcs11/main.c:1337
#4  main (argc=4, argv=0x7fffffffe2d8) at ../../../bin/named-pkcs11/main.c:1556

Workaround would be adding OPTIONS="-E libsofthsm2.so" to /etc/sysconfig/named

Thank you marin, pemensik.

After recovering with VMWare snapshots both of my IPA nodes to the previous Oracle Linux 8.4 version, I was able to update the machine and get it working with Oracle Linux 8.5, bind 9.11.26-6 and ipa-server 4.9.6-6.0.1.

Steps:

1 - Edit the /etc/sysconfig/named file, add a line with: OPTIONS="-E libsofthsm2.so"
2 - Perform the update (I did a yum update, updating all the server packages). When ipa-server package got update, it was able to automatically do the IPA upgrade and properly start the named-pkcs11.service.
3 - Check ipa service with ipactl status and ipa-healthcheck. I just had some warning about log permissions and the ipa-ca DNS record (somehow, the ipa-ca DNS registry only had the second node IP). I just added the IP of my first node in the DNS registry.

Regards.

After this issue one of my IPA servers failed dogtag. I've updated without being aware of the bug, things got messy, downgraded, issued ipa-server-upgrade manually and ended with broken dogtag.

After the workaround presented by @marin I was able to sucessfullty update everything but the damage has been done with dogtag.

ipactl status shows everything as running, but ipa-healthcheck complains:

The output of ipa-healthcheck is available here: https://pastebin.com/fiqsGESL

Regards.

EDIT: The fix is available here: https://www.mail-archive.com/freeipa-users@lists.fedorahosted.org/msg12583.html

It's the requiredSecret issue.

Login to comment on this ticket.

Metadata
Attachments 3
Attached 2 years ago View Comment
Attached 2 years ago View Comment
Attached 2 years ago View Comment