First server/master can be installed without any problems on a privileged container, and everything is working as expected.
If trying to add a replica from a second server, it fails waiting for replication of the HTTP-ServiceAccount.
[10/21]: setting up httpd keytab [error] NotFound: wait_for_entry timeout on ldap://dc01.ipa.gorill.site:389 for krbprincipalname=HTTP/dc02.ipa.gorill.site@IPA.GORILL.SITE,cn=services,cn=accounts,dc=ipa,dc=gorill,dc=site
Keeping my eyes on "journalctl -xf" on both servers, I can see following output (target/server to be replaced accordingly):
Dec 09 14:26:41 dc01.ipa.gorill.site ns-slapd[6276]: cn: repl keep alive 4 Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:44.804764269 +0100] - INFO - NSMMReplicationPlugin - repl5_tot_run - Finished total update of replica "agmt="cn=meTodc02.ipa.gorill.site" (dc02:389)". Sent 522 entries. Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1 Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1 Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1 Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 2 Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.532297770 +0100] - ERR - libdb - BDB0623 DB_MULTIPLE/DB_MULTIPLE_KEY buffers must be aligned, at least page size and multiples of 1KB Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.554181268 +0100] - ERR - bdb_map_error - bdb_public_cursor_bulkop failed with db error 22 : Invalid argument Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.564749425 +0100] - ERR - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389) - clcache_load_buffer - Can't locate CSN 61b2121f000200040000 in the changelog (DB rc=-12793). If replication stops, the consumer may need to be reinitialized. Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.573908283 +0100] - ERR - NSMMReplicationPlugin - changelog program - repl_plugin_name_cl - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): Failed to retrieve change with CSN 61b2121f000200040000; db error - -12793 Database operation error: Unhandled Database operation error. See details in previous error messages. Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.583083498 +0100] - ERR - NSMMReplicationPlugin - send_updates - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): A changelog database error was encountered Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.599619698 +0100] - ERR - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): Incremental update failed and requires administrator action
Initial replication is succesful, but following incremental replications are not working
working replication
$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server freeipa-server-4.9.8-1.fc35.x86_64 freeipa-client-4.9.8-1.fc35.x86_64 package ipa-server is not installed package ipa-client is not installed 389-ds-base-2.0.11-1.fc35.x86_64 package pki-ca is not installed krb5-server-1.19.2-2.fc35.x86_64
Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.
Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting
Hi @nascire As the issue happens in ns-slapd (the LDAP server), could you open a ticket at https://github.com/389ds/389-ds-base/issues ? Thanks
Hi @frenaud done as requested: https://github.com/389ds/389-ds-base/issues/5050 Can we leave this here open too in the meantime, in case they say it´s not their fault or something? :)
thanks
Hi @frenaud done as requested: https://github.com/389ds/389-ds-base/issues/5050 Can we leave this here open too in the meantime, in case they say it´s not their fault or something? :) thanks
Sure, I will add the "tracker" label to mark the dependency
Metadata Update from @frenaud: - Issue tagged with: tracker
Soooo ... in short, they don´t see a problem/can´t reproduce it, ...
Meanwhile, someone managed to reproduce it - problem seems to be container based. (Just an update, so no bot or something closes this here - ticket on 389ds-github see above)
As i needed to progress, I tried with AlmaLinux ... works like a charm ... error seems to be specific to Fedora
Unfortunately I am unable to go back to almalinux or rockylinux for the container image as they run 4.9.6 and my fedora 35 master is 4.9.8. Rebuilding the domain is not something I'd like to do again... Hopefully someone will track the issue down or the version for rocky or alma can be bumped to 4.9.8.
FWIW - This issue persists with the rawhide container image as well.
Was able to create the replica successfully when the zfs backing store filesystem had a record size of 4k. See details here: https://github.com/389ds/389-ds-base/issues/5050
I´m unable to test, but should be fixed with: https://github.com/389ds/389-ds-base/pull/5150
Metadata Update from @nascire: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.