#7910 Detect group size with healthcheck to predict ipa-replica-install failures
Opened 3 months ago by fcami. Modified 3 months ago

Request for enhancement

As FreeIPA admin I want to know if any of the groups in IPA has become so large that the default nsslapd-maxsasliosize is not large enough for ipa-replica-install.


FreeIPA sets both nsslapd-maxsasliosize and sslapd-sasl-max-buffer-size to 2MiB.
This is not enough and leads to ipa-replica-install failing to complete replication.
See the Broken Replica thread.

Steps to Reproduce

  1. Install master server
  2. Have a group with 17K members
  3. Try to install replica

Actual behavior

[20/Mar/2019:09:28:06.545187923 +0100] - INFO - NSMMReplicationPlugin -
repl5_tot_run - Beginning total update of replica
"agmt="cn=meToidc01.my.dom.ain" (idc01:389)".
[20/Mar/2019:09:28:26.528046160 +0100] - ERR - NSMMReplicationPlugin -
perform_operation - agmt="cn=meToidc01.my.dom.ain" (idc01:389): Failed
to send extended operation: LDAP error -1 (Can't contact LDAP server)
[20/Mar/2019:09:28:26.530763939 +0100] - ERR - NSMMReplicationPlugin -
repl5_tot_log_operation_failure - agmt="cn=meToidc01.my.dom.ain"
(idc01:389): Received error -1 (Can't contact LDAP server):  for total
update operation
[20/Mar/2019:09:28:26.532678072 +0100] - ERR - NSMMReplicationPlugin -
release_replica - agmt="cn=meToidc01.my.dom.ain" (idc01:389): Unable to
send endReplication extended operation (Can't contact LDAP server)
[20/Mar/2019:09:28:26.534307539 +0100] - ERR - NSMMReplicationPlugin -
repl5_tot_run - Total update failed for replica
"agmt="cn=meToidc01.my.dom.ain" (idc01:389)", error (-11)
[20/Mar/2019:09:28:26.561763168 +0100] - INFO - NSMMReplicationPlugin -
bind_and_check_pwp - agmt="cn=meToidc01.my.dom.ain" (idc01:389):
Replication bind with GSSAPI auth resumed
[20/Mar/2019:09:28:26.582389258 +0100] - WARN - NSMMReplicationPlugin -
repl5_inc_run - agmt="cn=meToidc01.my.dom.ain" (idc01:389): The remote
replica has a different database generation ID than the local database.
 You may have to reinitialize the remote replica, or the local replica.

Expected behavior

ipa-replica-install completes successfully.

Additional info:

Discussing this with @tbordaz led to the proposal to query LDAP directly to determine the size of the maximum payload (e.g. group size) by determining the largest group and counting the size of the reply modulo encryption.

Just for recording.
The sasl limitation nsslapd-maxsasliosize was introduced in 1.2.x ( https://bugzilla.redhat.com/show_bug.cgi?id=637852). Initially the purpose of the BZ was to receive a sasl msg split in several packets. But the limitation is enforced I guess to protect the server from an attack because the server allocates a buffer to fit a full message.

Login to comment on this ticket.