Issue #9058: Unable to create replica between 2 Fedora 35 LXC containers - freeipa

freeipa

#9058 Unable to create replica between 2 Fedora 35 LXC containers

Closed: fixed 2 years ago by nascire. Opened 2 years ago by nascire.

Issue

First server/master can be installed without any problems on a privileged container, and everything is working as expected.

If trying to add a replica from a second server, it fails waiting for replication of the HTTP-ServiceAccount.

[10/21]: setting up httpd keytab
  [error] NotFound: wait_for_entry timeout on ldap://dc01.ipa.gorill.site:389 for krbprincipalname=HTTP/dc02.ipa.gorill.site@IPA.GORILL.SITE,cn=services,cn=accounts,dc=ipa,dc=gorill,dc=site

Keeping my eyes on "journalctl -xf" on both servers, I can see following output (target/server to be replaced accordingly):

Dec 09 14:26:41 dc01.ipa.gorill.site ns-slapd[6276]: cn: repl keep alive 4
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:44.804764269 +0100] - INFO - NSMMReplicationPlugin - repl5_tot_run - Finished total update of replica "agmt="cn=meTodc02.ipa.gorill.site" (dc02:389)". Sent 522 entries.
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 1
Dec 09 14:26:44 dc01.ipa.gorill.site ns-slapd[6276]: GSSAPI client step 2
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.532297770 +0100] - ERR - libdb - BDB0623 DB_MULTIPLE/DB_MULTIPLE_KEY buffers must be aligned, at least page size and multiples of 1KB
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.554181268 +0100] - ERR - bdb_map_error - bdb_public_cursor_bulkop failed with db error 22 : Invalid argument
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.564749425 +0100] - ERR - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389) - clcache_load_buffer - Can't locate CSN 61b2121f000200040000 in the changelog (DB rc=-12793). If replication stops, the consumer may need to be reinitialized.
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.573908283 +0100] - ERR - NSMMReplicationPlugin - changelog program - repl_plugin_name_cl - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): Failed to retrieve change with CSN 61b2121f000200040000; db error - -12793 Database operation error:  Unhandled Database operation error. See details in previous error messages.
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.583083498 +0100] - ERR - NSMMReplicationPlugin - send_updates - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): A changelog database error was encountered
Dec 09 14:26:45 dc01.ipa.gorill.site ns-slapd[6276]: [09/Dec/2021:15:26:45.599619698 +0100] - ERR - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=meTodc02.ipa.gorill.site" (dc02:389): Incremental update failed and requires administrator action

Steps to Reproduce

Create a fresh Fedora 35 LXC container
ipa-install-server, ....
Create a second Fedora 35 LXC container
ipa-replica-install, ....

Actual behavior

Initial replication is succesful, but following incremental replications are not working

Expected behavior

working replication

Version/Release/Distribution

$ rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
freeipa-server-4.9.8-1.fc35.x86_64
freeipa-client-4.9.8-1.fc35.x86_64
package ipa-server is not installed
package ipa-client is not installed
389-ds-base-2.0.11-1.fc35.x86_64
package pki-ca is not installed
krb5-server-1.19.2-2.fc35.x86_64

Additional info:

Any additional information, configuration, data or log snippets that is needed for reproduction or investigation of the issue.

Log file locations: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/config-files-logs.html
Troubleshooting guide: https://www.freeipa.org/page/Troubleshooting

frenaud commented 2 years ago

Hi @nascire
As the issue happens in ns-slapd (the LDAP server), could you open a ticket at https://github.com/389ds/389-ds-base/issues ? Thanks

nascire commented 2 years ago

Hi @frenaud
done as requested: https://github.com/389ds/389-ds-base/issues/5050
Can we leave this here open too in the meantime, in case they say it´s not their fault or something? :)

thanks

frenaud commented 2 years ago

Hi @frenaud
done as requested: https://github.com/389ds/389-ds-base/issues/5050
Can we leave this here open too in the meantime, in case they say it´s not their fault or something? :)

thanks

Sure, I will add the "tracker" label to mark the dependency

Metadata Update from @frenaud:
- Issue tagged with: tracker

2 years ago

nascire commented 2 years ago

Soooo ... in short, they don´t see a problem/can´t reproduce it, ...

Meanwhile, someone managed to reproduce it - problem seems to be container based.
(Just an update, so no bot or something closes this here - ticket on 389ds-github see above)

nascire commented 2 years ago

As i needed to progress, I tried with AlmaLinux ... works like a charm ... error seems to be specific to Fedora

croadfeldt commented 2 years ago

Unfortunately I am unable to go back to almalinux or rockylinux for the container image as they run 4.9.6 and my fedora 35 master is 4.9.8. Rebuilding the domain is not something I'd like to do again... Hopefully someone will track the issue down or the version for rocky or alma can be bumped to 4.9.8.

croadfeldt commented 2 years ago

FWIW - This issue persists with the rawhide container image as well.

croadfeldt commented 2 years ago

Was able to create the replica successfully when the zfs backing store filesystem had a record size of 4k. See details here: https://github.com/389ds/389-ds-base/issues/5050

nascire commented 2 years ago

I´m unable to test, but should be fixed with: https://github.com/389ds/389-ds-base/pull/5150

Metadata Update from @nascire:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

None

Milestone

None

affects_doc

None

source

None

knownissue

None

type

None

blockedby

None

test_case

None

component

None

blocking

None

on_review

None

keywords

None

test_coverage

None

reviewer

None

external_tracker

None

rhbz

None

tester

None

changelog

None

design

None

freeipa

Source Code

#9058 Unable to create replica between 2 Fedora 35 LXC containers Closed: fixed 2 years ago by nascire. Opened 2 years ago by nascire.

Issue

Steps to Reproduce

Actual behavior

Expected behavior

Version/Release/Distribution

Additional info:

Metadata

tracker

#9058 Unable to create replica between 2 Fedora 35 LXC containers

Closed: fixed 2 years ago by nascire. Opened 2 years ago by nascire.