#4756 IPA Replicate creation fails with error "Update failed! Status: [10 Total update abortedLDAP error: Referral]"
Closed: Invalid None Opened 8 years ago by jcholast.

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 7): Bug 1166265

Please note that this Bug is private and may not be accessible as it contains confidential Red Hat customer information.

Description of problem:

IPA replica creation is failing in RHEL 7 with error "Update failed! Status:
[10 Total update abortedLDAP error: Referral]"

The replica issue is observed only if the replica server is a VM.

MASTER          REPLICA        Result
=======         =======        =======
Physical        physical       working

Physical        Virtual        Not working


1) In Master - nsds5replicaLastUpdateStatus for replica

-------------------------------------------------------------------------------
nsds5replicaLastInitStatus: 10 Total update abortedLDAP error: Referral
-------------------------------------------------------------------------------

2) In Replica - nsds5replicaLastUpdateStatus for master

-------------------------------------------------------------------------------
nsds5replicaLastUpdateStatus: 402 Replication error acquiring replica: unknown
  error - Replica has different database generation ID, remote replica may nee
 d to be initialized
-------------------------------------------------------------------------------

The issue is observed when the data already exist in IPA master and the number
of user/group/netgroup records are above 1000 (tested with 1500).

Replication work successfully when the number were less


Version-Release number of selected component (if applicable):

ipa-python-3.3.3-28.el7.x86_64
ipa-client-3.3.3-28.el7.x86_64
ipa-admintools-3.3.3-28.el7.x86_64
ipa-server-3.3.3-28.el7.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install IPA Master in hardware server with 1000+ records.
2. Initiate replica creation process on VM server.

Actual results:

The replica creation process fails with error

----------------------------------------------------------------------------
Update in progress, 128 seconds elapsed^[[A
Update in progress yet not in progress

[ipaserver.example.com] reports: Update failed! Status: [10 Total update
abortedLDAP error: Referral]
----------------------------------------------------------------------------

Expected results:

Replica creation should succeed.

Additional info:

Thierry's assessment:

In IPA, we can have multiple approaches possibly mixing them.

  • Increase the nsds5replicaTimeout but we never know how much time we have to wait to state that there is a problem.
  • IPA could (in ipa-replica-prepare ?) tests if there are big entries and adapt the timeout.
  • ipa-replica-install could test the init status. If it fails for timeout reason, it could retry with a larger timeout. Testing a maximum limit (i.e. 1/2h) to report that initialisation of the replica is not possible. This kind of fix, requires the fix in DS.

A rapid and likely good enough fix, nsds5replicaTimeout could be set to 600 (10min) instead of 120.

The possible fixes described in https://fedorahosted.org/freeipa/ticket/4756#comment:1 are valid when there is timeout issue during a full initialisation

But going further in investigating https://bugzilla.redhat.com/show_bug.cgi?id=1166265, I think there is a bug in DS. The bug is not systematic (actually was only seen on VM) and slowing down the master (adding breakpoints) make the full init successful. During a full update, the replica agreement is testing (poll) the connection before sending the next entry. When hitting that bug, the poll does return on timeout. That triggers the initialisation abort.
So changing nsds5replicaTimemout is NOT a workaround or a fix for the bug.

Thierry, I am also wondering - isn't this a duplicate of #4048? It was also happening in similar scenarios.

I think #4048 (and #3314) are not directly related to full initialisation failure.

They are related to failing replication due to large updates. So nsslapd-maxbersize need to be adapted. #4048 provides the ability to tune it in addition to others cache parameters.

The current ticket is more related to the dynamic of replica agreement that send/receive updates/results and the consumer ability to handle the load.
The supplier (RA) looks not flexible enough in the way it sends/receives the updates/results. It sends tons of entries until it hangs waiting for more room to send the remaining entries. This may prevent the RA.receiver to read the results.

A fix is identified (https://fedorahosted.org/389/ticket/47942) waiting for triage on that DS ticket.

Right. This is a 389 fix (https://bugzilla.redhat.com/show_bug.cgi?id=1166265#c23), so we can now close the FreeIPA ticket.

Metadata Update from @jcholast:
- Issue assigned to someone
- Issue set to the milestone: FreeIPA 4.1.3

5 years ago

Login to comment on this ticket.

Metadata