Issue #9196: [Tracker] Random nightly failure in ipa-replica-install: Failed to start replication - freeipa

freeipa

#9196 [Tracker] Random nightly failure in ipa-replica-install: Failed to start replication

Opened 2 years ago by frenaud. Modified 2 years ago

Issue

FreeIPA nightly tests randomly fail trying to setup the replication. See for instance this test report with the following logs.

Package Version and Platform:

Platform: Fedora 36
Package and version: 389-ds-base-2.1.1-2
The full package list is available here.

Steps to Reproduce

Steps to reproduce the behavior:
1. on the master, install ipa server with ipa-server-install --domain ipa.test --realm IPA.TEST -a Secret123 -p Secret123 --setup-dns --auto-forwarders --auto-reverse -U
2. on the replica, install an ipa client with ipa-client-install --domain ipa.test --realm IPA.TEST -p admin -w Secret123 --server server.ipa.test -U
3. on the replica, promote the machine as a replica with kinit admin; ipa-replica-install -U
The replica installation fails randomly.

Expected behavior

Replica installation should succeed.

Initial investigation

The replica installation fails in the step setting up initial replication, with the following error:

...
  [27/42]: creating DS keytab
  [28/42]: ignore time skew for initial replication
  [29/42]: setting up initial replication
Starting replication, please wait until this has completed.

Update in progress, 1 seconds elapsed
Update in progress, 2 seconds elapsed
Update in progress, 3 seconds elapsed
Update in progress, 4 seconds elapsed
Update in progress, 5 seconds elapsed
Update in progress, 6 seconds elapsed
Update in progress, 7 seconds elapsed
Update in progress, 8 seconds elapsed
Update in progress, 9 seconds elapsed
Update in progress, 10 seconds elapsed
Update in progress, 11 seconds elapsed
Update in progress, 12 seconds elapsed
Update in progress, 13 seconds elapsed
Update in progress, 14 seconds elapsed
Update in progress, 15 seconds elapsed
[ldap://master.ipa.test:389] reports: Update failed! Status: [Error (49) - LDAP error: Invalid credentials - no response received]

  [error] RuntimeError: Failed to start replication
Failed to start replication
The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

The replica installer performs the following steps:
- create a connection to the master, bind as fqdn=replica0.ipa.test,cn=computers,cn=accounts,dc=ipa,dc=test
- fetch nsDS5ReplicaId from the master
- increment and update the value on the master
- add replica config on the replica in cn=replica,cn=dc\3Dipa\2Cdc\3Dtest,cn=mapping tree,cn=config
- set changelog maxage to 30d on the replica
- on the replica, create a special user to let SASL mapping find a valid user on first replication: cn=ldap/master.ipa.test@IPA.TEST,cn=config and add this user to nsDS5ReplicaBindDN
- on the replica, create a SASL mapping cn=Peer Master,cn=mapping,cn=sasl,cn=config

objectclass: top, nsSaslMapping
nsSaslMapRegexString: '^[^:@]+$'
nsSaslMapBaseDNTemplate: cn=config
nsSaslMapFilterTemplate: '(cn=&@IPA.TEST)'
nsSaslMapPriority: 1

This will map a kerberos principal ldap/master.ipa.test@IPA.TEST to the entry cn=ldap/master.ipa.test@IPA.TEST,cn=config.

add replica config on the master in cn=replica,cn=dc\3Dipa\2Cdc\3Dtest,cn=mapping tree,cn=config
set changelog maxage to 30d on the master
on the master, make sure the group cn=replication managers,cn=sysaccounts,cn=etc,dc=ipa,dc=test exists and contains the principals for master and replica
create repl agreement on the master cn=meToreplica0.ipa.test,cn=replica,cn=dc\=ipa\,dc\=test,cn=mapping tree,cn=config
create repl agreement on the replica cn=meTomaster.ipa.test,cn=replica,cn=dc\=ipa\,dc\=test,cn=mapping tree,cn=config
start replication by setting nsds5BeginReplicaRefresh=start on the master (entry cn=meToreplica0.ipa.test,...)
read the entry and check if the replication has started. This fails.
The audit logs show the MOD operation is happening at 20220704162708. After that, a search is done every second to check the replication status but the replication fails to start.
The master's error log shows the following error:

ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToreplica0.ipa.test" (replica0:389) - Replication bind with GSSAPI auth failed: LDAP error 49 (Invalid credentials) ()

and the replica access log shows the master trying to connect but failing:

[04/Jul/2022:16:27:09.084070988 +0000] conn=4 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
[04/Jul/2022:16:27:09.093387701 +0000] conn=4 op=0 RESULT err=49 tag=97 nentries=0 wtime=0.000044549 optime=0.009317877 etime=0.009359701 - SASL(-13): authentication failure: GSSAPI Failure: gss_accept_sec_context
[04/Jul/2022:16:27:09.097120258 +0000] conn=4 op=1 UNBIND

The connection should be authenticated as cn=ldap/master.ipa.test@IPA.TEST,cn=config as the master is using its kerberos principal.
It seems that the SASL mapping is not working.

Companion issue opened against 389-ds: https://github.com/389ds/389-ds-base/issues/5361

frenaud commented 2 years ago

Also seen in PR #1842 with test_caless_TestServerReplicaCALessToCAFull and test_cert (fedora36 with 389-ds-base-2.1.1-2.fc36.x86_64)

Edited 2 years ago by frenaud

amore commented 2 years ago

Reproduced in testing_master_latest_selinux
report log

amore commented 2 years ago

reproduced in testing_master_latest_selinux
report-test_cert
report-test_replication_layouts_TestLineTopologyWithoutCA

Edited 2 years ago by amore

frenaud commented 2 years ago

reproduced in [testing_master_latest] PR #1942
test_replication_layouts_TestLineTopologyWithoutCA: report, logs

frenaud commented 2 years ago

reproduced in [testing_ipa-4.10_latest], PR #1957
test_backup_and_restore_TestReplicaInstallAfterRestore: report, logs

amore commented 2 years ago

reproduced in testing_master_latest_selinux,
report

sumedhs commented 2 years ago

Reproducible in testing_master_latest PR 1965 Report

amore commented 2 years ago

Reproducible in testing_master_latest_selinux report

amore commented 2 years ago

Reproducible in testing_master_latest_selinux report

sumedhs commented 2 years ago

Reproducible in testing_master_latest PR 2014 Report

sumedhs commented 2 years ago

Reproducible in testing_master_latest PR 2043 Report

msauton commented 2 years ago

is it possible some of the test runs have the same replica un-installed, then re-installed?
( so the main replica may have the previous Kerberos tickets of the newer IPA replica in memory, until expiration, and a main replica LDAP service restart may be needed )

frenaud commented 2 years ago

is it possible some of the test runs have the same replica un-installed, then re-installed?
( so the main replica may have the previous Kerberos tickets of the newer IPA replica in memory, until expiration, and a main replica LDAP service restart may be needed )

The CI is using a common Vagrant image but creates a new VM instance for each test and installs from scratch server, replica, client -> it can't be caused by something remaining from a previous test.
This issue is also seen in tests that don't do uninstall/reinstall -> probably not linked to a previous krb ticket.

amore commented 2 years ago

Reproduced in testing_master_latest_selinux
- test_installation_TestInstallWithCA_KRA2
- test_replication_layouts_TestLineTopologyWithCAKRA

Metadata

Assignee

None

Tags

Blocking

None

Depending on

None

Priority

None

Milestone

None

affects_doc

None

source

None

knownissue

None

type

None

blockedby

None

test_case

None

component

None

blocking

None

on_review

None

keywords

None

test_coverage

None

reviewer

None

external_tracker

None

rhbz

None

tester

None

changelog

None

design

None

freeipa

Source Code

#9196 [Tracker] Random nightly failure in ipa-replica-install: Failed to start replication Opened 2 years ago by frenaud. Modified 2 years ago

Close issue as:

Issue

Package Version and Platform:

Steps to Reproduce

Expected behavior

Initial investigation

Metadata

test-failure tracker

#9196 [Tracker] Random nightly failure in ipa-replica-install: Failed to start replication

Opened 2 years ago by frenaud. Modified 2 years ago