#9058 Unable to create replica between 2 Fedora 35 LXC containers
Closed: fixed 2 years ago by nascire. Opened 2 years ago by nascire.

Issue

First server/master can be installed without any problems on a privileged container, and everything is working as expected.

If trying to add a replica from a second server, it fails waiting for replication of the HTTP-ServiceAccount.

[10/21]: setting up httpd keytab
  [error] NotFound: wait_for_entry timeout on ldap://dc01.ipa.gorill.site:389 for krbprincipalname=HTTP/dc02.ipa.gorill.site@IPA.GORILL.SITE,cn=services,cn=accounts,dc=ipa,dc=gorill,dc=site

Keeping my eyes on "journalctl -xf" on both servers, I can see following output (target/server to be replaced accordingly):

Dec 09 14:26:41 dc01.ipa.gorill.site ns-slapd[6276]: cn: repl keep alive 4
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:44.804764269 +0100] - INFO - NSMMReplicationPlugin - repl5_tot_run - Finished total update of replica "agmt="cn=meTodc02.ipa.gorill.site" (dc02:389)". Sent 522 entries.
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 2
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.532297770 +0100] - ERR - libdb - BDB0623 DB_MULTIPLE/DB_MULTIPLE_KEY buffers must be aligned, at least page size and multiples of 1KB
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.554181268 +0100] - ERR - bdb_map_error - bdb_public_cursor_bulkop failed with db error 22 : Invalid argument
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.564749425 +0100] - ERR - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389) - clcache_load_buffer - Can't locate CSN 61b2121f000200040000 in the changelog (DB rc=-12793). If replication stops, the consumer may need to be reinitialized.
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.573908283 +0100] - ERR - NSMMReplicationPlugin - changelog program - repl_plugin_name_cl - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): Failed to retrieve change with CSN 61b2121f000200040000; db error - -12793 Database operation error:  Unhandled Database operation error. See details in previous error messages.
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.583083498 +0100] - ERR - NSMMReplicationPlugin - send_updates - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): A changelog database error was encountered
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.599619698 +0100] - ERR - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): Incremental update failed and requires administrator action

Steps to Reproduce

  1. Create a fresh Fedora 35 LXC container
  2. ipa-install-server, ....
  3. Create a second Fedora 35 LXC container
  4. ipa-replica-install, ....

Actual behavior

Initial replication is succesful, but following incremental replications are not working

Expected behavior

working replication

Version/Release/Distribution

$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
freeipa-server-4.9.8-1.fc35.x86_64
freeipa-client-4.9.8-1.fc35.x86_64
package ipa-server is not installed
package ipa-client is not installed
389-ds-base-2.0.11-1.fc35.x86_64
package pki-ca is not installed
krb5-server-1.19.2-2.fc35.x86_64

Additional info:

Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.

Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html
Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting


Hi @nascire
As the issue happens in ns-slapd (the LDAP server), could you open a ticket at https://github.com/389ds/389-ds-base/issues ? Thanks

Hi @frenaud
done as requested: https://github.com/389ds/389-ds-base/issues/5050
Can we leave this here open too in the meantime, in case they say it´s not their fault or something? :)

thanks

Hi @frenaud
done as requested: https://github.com/389ds/389-ds-base/issues/5050
Can we leave this here open too in the meantime, in case they say it´s not their fault or something? :)

thanks

Sure, I will add the "tracker" label to mark the dependency

Metadata Update from @frenaud:
- Issue tagged with: tracker

2 years ago

Soooo ... in short, they don´t see a problem/can´t reproduce it, ...

Meanwhile, someone managed to reproduce it - problem seems to be container based.
(Just an update, so no bot or something closes this here - ticket on 389ds-github see above)

As i needed to progress, I tried with AlmaLinux ... works like a charm ... error seems to be specific to Fedora

Unfortunately I am unable to go back to almalinux or rockylinux for the container image as they run 4.9.6 and my fedora 35 master is 4.9.8. Rebuilding the domain is not something I'd like to do again... Hopefully someone will track the issue down or the version for rocky or alma can be bumped to 4.9.8.

FWIW - This issue persists with the rawhide container image as well.

Was able to create the replica successfully when the zfs backing store filesystem had a record size of 4k. See details here: https://github.com/389ds/389-ds-base/issues/5050

Metadata Update from @nascire:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata