#7228 Replication agreements/topology is not setup properly
Closed: fixed 5 years ago by rcritten. Opened 6 years ago by rcritten.

On Fedora 27, build from master, git up-date-today.

I did a very typical installation of a single master plus one replica with no additional services:

Install on master: ipa-server-install -a password -p password -r EXAMPLE.COM -U
Install on replica1: ipa-replica-install --server master.example.com --domain example.com

I wanted to see how initialize left and right worked so I took a look at the topology:

# ipa topologysegment-find domain
------------------
0 segments matched
------------------
----------------------------
Number of entries returned 0
----------------------------

Oh, odd. Where is the agreement:

repica1 # ipa-replica-manage list -v replica1.example.com
master.example.com: replica
  last init status: None
  last init ended: 1970-01-01 00:00:00+00:00
  last update status: Error (0) Replica acquired successfully: Incremental update succeeded
  last update ended: 2017-10-30 17:00:04+00:00

UI confirms that there are two nodes and no connections.

How is this possible?

Domainlevel is 1

Even worse I used the UI to create a connection between master and replica1 and now I have TWO agreements:

replica1 # ipa-replica-manage list -v replica1-example.com
master.example.com: replica
  last init status: None
  last init ended: 1970-01-01 00:00:00+00:00
  last update status: Error (0) Replica acquired successfully: Incremental update succeeded
  last update ended: 2017-10-30 17:29:51+00:00
master.example.com: replica
  last init status: None
  last init ended: 1970-01-01 00:00:00+00:00
  last update status: Error (0) Replica acquired successfully: Incremental update succeeded
  last update ended: 2017-10-30 17:29:49+00:00

So now this is very, very bad.

# ipa topologysuffix-verify domain
====================================================
Replication topology of suffix "domain" is in order.
====================================================

Yeah, not so much.


Metadata Update from @rcritten:
- Issue priority set to: critical

6 years ago

Ticket https://pagure.io/freeipa/issue/7176 was already opened for this issue, I closed 7176 as duplicate of this one to avoid growing the backlog.

Few comments but no conclusion:

Enabling plugin log level, we can see

'ipa-topology-plugin - ipa_topo_util_get_replica_conf: replica already exists'

It is looking like topology plugin finds into its internal structure that the config of the topology domain, already exist. So it does not recreate it and does not recreate the segments.
Next step would be to reproduce and debug topology plugin.

@tbordaz I will try to reproduce it with debug loglevel as discussed.

Initialzation of the config data occurred 20 second after DS startup (as expected)

  (gdb) print topo_plugin_conf->startup_delay
$2 = 20
[08/Nov/2017:09:24:13.428993526 +0100] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests
...
[08/Nov/2017:09:24:22.429108827 +0100] - ERR - NSMMReplicationPlugin - acquire_replica - agmt="cn=meTo<vm.fqdn>" ...: Unable to acquire replica: permission denied. The bind dn "" does not have permission to supply replication updates to the replica. Will retry later.
[08/Nov/2017:09:24:33.432063270 +0100] - DEBUG - ipa-topology-plugin - ipa_topo_util_get_replica_segments: no segments found
[08/Nov/2017:09:24:33.437798291 +0100] - DEBUG - ipa-topology-plugin - ipa_topo_util_update_agmts_list: no agreements found
[08/Nov/2017:09:24:33.439961408 +0100] - ERR - ipa-topology-plugin - ipa_topo_util_get_replica_conf: replica already exists
[08/Nov/2017:09:24:33.441353476 +0100] - DEBUG - ipa-topology-plugin - cannot find replica entry for: o=ipaca

It did not find 'domain' into the in memory shared config
So it (should have) added it and then it tried to build segments.
it reports no replica Agreement were found

[08/Nov/2017:09:24:22.429108827 +0100] - ERR - NSMMReplicationPlugin - acquire_replica - agmt="cn=meTo<vm.fqdn>" ...: Unable to acquire replica: permission denied. The bind dn "" does not have permission to supply replication updates to the replica. Will retry later.
[08/Nov/2017:09:24:33.432063270 +0100] - DEBUG - ipa-topology-plugin - ipa_topo_util_get_replica_segments: no segments found
[08/Nov/2017:09:24:33.437798291 +0100] - DEBUG - ipa-topology-plugin - ipa_topo_util_update_agmts_list: no agreements found

Although it existed one

dn: cn=meTo<vm.fqdn>,cn=replica,cn=SUFFIX,cn=mapping tree,cn=config
createTimestamp: 20171108082339Z

  Having found no RA it did not create segments

In addition the replica 'domain' is not present in in memory shared_config (see topo_shared_conf->replicas dump below)

it finds 'ca' entry into the in memory shared_config

(gdb) print *topo_shared_conf->replicas
$3 = {next = 0x0, repl_lock = 0x56222dac0840, 
shared_config_base = 0x56222d65ad20 "cn=ca,cn=topology,cn=ipa,cn=etc,<SUFFIX>", 
repl_root = 0x56222d251320 "o=ipaca", strip_attrs = 0x0, total_attrs = 0x0, repl_attrs = 0x0, repl_segments = 0x0, 
hosts = 0x56222d66d9f0}

[08/Nov/2017:09:24:33.439961408 +0100] - ERR - ipa-topology-plugin - ipa_topo_util_get_replica_conf: replica already exists
[08/Nov/2017:09:24:33.441353476 +0100] - DEBUG - ipa-topology-plugin - cannot find replica entry for: o=ipaca

So it does not create the segments for 'ca'

In conclusion: No explanation of the following questions
- why in memory shared_config contains 'ca' but not 'domain' ?
- why no segment was created for 'domain' althouth a RA existed
- Is it normal that topo_shared_conf->replicas contained 'ca'

  • why in memory shared_config contains 'ca' but not 'domain' ?
    no idea
  • why no segment was created for 'domain' althouth a RA existed
    do you have an access log with internal operations to check what the search for agreements looks like
  • Is it normal that topo_shared_conf->replicas contained 'ca'
    i think if there is a cn=ca entry below cn=topology it should be

Is it normal that topo_shared_conf->replicas contained 'ca'
i think if there is a cn=ca entry below cn=topology it should be

Note that the corresponding message is fatal:

    slapi_log_error(SLAPI_LOG_FATAL, IPA_TOPO_PLUGIN_SUBSYSTEM,
                    "ipa_topo_util_get_replica_conf: "
                    "replica already exists\n");

yeah, but it is probably only the log level, need to check.

I'm wondering if we don't have a stupid error handling problem, we have cn=ca and cn=domain below cn=topology, but only cn=domain has a replica and is replicated. so it first checks ca and fails and stops, so the conf only contains ca and the agreements for domain are not processed

huh, this is very, very weird.

enabling internal logging and restarting shows these searches:

 [08/Nov/2017:14:26:51.632401947 +0100] conn=Internal op=-1 SRCH base="cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" scope=1 filter="(ipaReplTopoConfRoot=dc=dom-171

-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com)" attrs=ALL
[08/Nov/2017:14:26:51.633310315 +0100] conn=Internal op=-1 RESULT err=0 tag=48 nentries=2 etime=0.0001077764 notes=U
[08/Nov/2017:14:26:51.633363794 +0100] conn=Internal op=-1 SRCH base="cn=ca,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" scope=1 filter="objectclass=" attrs=ALL
[08/Nov/2017:14:26:51.633414348 +0100] conn=Internal op=-1 RESULT err=0 tag=48 nentries=0 etime=0.0000061557 notes=U
[08/Nov/2017:14:26:51.636494781 +0100] conn=Internal op=-1 SRCH base="cn=config" scope=2 filter="(&(objectclass=nsds5replicationagreement)(nsds5replicaroot=o=ipaca))" attrs=ALL
[08/Nov/2017:14:26:51.638224937 +0100] conn=Internal op=-1 RESULT err=0 tag=48 nentries=0 etime=0.0001747669
[08/Nov/2017:14:26:51.639822380 +0100] conn=Internal op=-1 SRCH base="cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" scope=1 filter="(ipaReplTopoConfRoot=o=ipaca)"
attrs=ALL
[08/Nov/2017:14:26:51.639952571 +0100] conn=Internal op=-1 RESULT err=0 tag=48 nentries=2 etime=0.0000148936 notes=U
[08/Nov/2017:14:26:51.643313427 +0100] conn=Internal op=-1 SRCH base="cn=masters,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" scope=1 filter="objectclass=
" attrs=ALL
[08/Nov/2017:14:26:51.643696101 +0100] conn=Internal op=-1 RESULT err=0 tag=48 nentries=2 etime=0.0000403807 notes=U

so the search for (ipaReplTopoConfRoot=o=ipaca) returns two entries, but there should only be one.

Doing plain ldap searches shows:

  [ludwig@elkris ~]$ ldapsearch -LLL -o ldif-wrap=no -h vm-171-120.abc.idm.lab.eng.brq.redhat.com -p 389 -x -D "cn=directory manager"  -w  -b "cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" "ipaReplTopoConfRoot=o=ipaca" ipaReplTopoConfRoot
  dn: cn=ca,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: o=ipaca

  [ludwig@elkris ~]$ ldapsearch -LLL -o ldif-wrap=no -h vm-171-120.abc.idm.lab.eng.brq.redhat.com -p 389 -x -D "cn=directory manager"  -w  -b "cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com"  ipaReplTopoConfRoot
  dn: cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com

  dn: cn=domain,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com

  dn: cn=ca,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: o=ipaca

  [ludwig@elkris ~]$ ldapsearch -LLL -o ldif-wrap=no -h vm-171-120.abc.idm.lab.eng.brq.redhat.com -p 389 -x -D "cn=directory manager"  -w  -b "cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com"  -s one ipaReplTopoConfRoot
  dn: cn=domain,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com

  dn: cn=ca,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: o=ipaca

Everything is fine, but when adding the onelvel scope it goes wrong

  [ludwig@elkris ~]$ ldapsearch -LLL -o ldif-wrap=no -h vm-171-120.abc.idm.lab.eng.brq.redhat.com -p 389 -x -D "cn=directory manager"  -w  -b "cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com"  -s one "ipaReplTopoConfRoot=o=ipaca" ipaReplTopoConfRoot
  dn: cn=domain,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com

  dn: cn=ca,cn=topology,cn=ipa,cn=etc,dc=dom-171-121,dc=abc,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com
  ipaReplTopoConfRoot: o=ipaca

both children are returned, although only one matches

I can reproduce this with plain ds 1.3.7:

 ldapsearch -LLL -o ldif-wrap=no -h localhost  -p 39001 -x -D "cn=directory manager" -w  -b "ou=sub,dc=example,dc=com" -s sub  description=ca description
 dn: cn=ca,ou=sub,dc=example,dc=com
 description: ca

 ldapsearch -LLL -o ldif-wrap=no -h localhost  -p 39001 -x -D "cn=directory manager" -w  -b "ou=sub,dc=example,dc=com" -s one  description=ca description
 dn: cn=domain,ou=sub,dc=example,dc=com
 description: domain

 dn: cn=ca,ou=sub,dc=example,dc=com
 description: ca

but it seems to work correctly, looks like a missing backport

created 389-ds ticket #49443 for this

Metadata Update from @pvoborni:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1514051

6 years ago

https://pagure.io/389-ds-base/issue/49443

The DS bug is fixed in

b0689cd..c917b93 master -> master
5878619..584264a 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

Metadata Update from @pvoborni:
- Issue set to the milestone: FreeIPA 4.5.4
- Issue tagged with: tracker

6 years ago

it is fixed in DS so changing that test is needed. It can probably extend simple replication test suite.

Metadata Update from @pvoborni:
- Issue untagged with: tracker
- Issue tagged with: tests

6 years ago

master:

  • 4daac52 ipaldap: allow GetEffectiveRights on individual operations
  • b466172 ldap2: fix implementation of can_add

ipa-4-6:

  • ac9efdc ipaldap: allow GetEffectiveRights on individual operations
  • dee3255 ldap2: fix implementation of can_add

Metadata Update from @cheimes:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

Metadata Update from @cheimes:
- Issue status updated to: Open (was: Closed)

6 years ago

Fix may need to be backported to 4.5.

Metadata Update from @cheimes:
- Issue priority set to: important (was: critical)

5 years ago

Metadata Update from @pvoborni:
- Issue set to the milestone: FreeIPA 4.5.5 (was: FreeIPA 4.5.4)

5 years ago

No backport was done and 4.5 is now EOL. Closing as fixed.

Metadata Update from @rcritten:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata