#48325 Cannot upgrade a consumer to supplier in a multimaster environment
Closed: Fixed None Opened 3 years ago by gbaruzzi.

Hi,

I created a Multi Master supplier and set up the replication to a consumer.
So long so good.
Then I wanted to upgrade the consumer to a master and using the console I modified the settings of the consumer to "Multimaster" and I set up a replication agreement to the supplier.
The I create an object the server which was the consumer.
At this point the problems begin: the status page of the replication tells "Incremental update has failed and requires adiministration action LDAP error: Can't contact LDAP Server. Error -1.
REsearching the issue I found that in the replication agreement of a running multimaster you have 3 lines containing the following:
nsds50ruv: {replicageneration} 561e6cd4000000010000
nsds50ruv: {replica 1 ldap://ldap1.syntlogo.local:389} 561f9af1000000010000 561fa88c000000010000
nsds50ruv: {replica 2 ldap://ldap2.syntlogo.local:389} 561fa030000000020000 561fa84d000000020000

but in the failing configuration the lines are 4:
nsds50ruv: {replicageneration} 56279cce000000010000
nsds50ruv: {replica 1 ldap://ldap1.syntlogo.local:1389} 5628c0b3000000010000 562f3fe3000000010000
nsds50ruv: {replica 2 ldap://ldap2.syntlogo.local:1389} 5629cae3000000020000 5629cae3000000020000
nsds50ruv: {replica 65535 ldap://ldap2.syntlogo.local:1389} 562f43300000ffff0000 562f50210000ffff0000

What makes me suspicious is the line containing the replica 65535, indicating that chainging the setting from consumer to suplier some old information is not being removed.

389-ds version 1.3.3.1 build 2015.218.023
Administration Server 1.1.38 Build 2015.068.1937
Cosole Framework 1.1.14
Thank you,
Giovanni


I was able to reproduce some of the issue, which I believe is the main problem. I could not reproduce the the "replica 65535" issue, but I could reproduce the fact that replciation breaks after promoting a consumer.

So after promoting a consumer to a supplier, the database RUV gets out of order. So on both servers I see:

{{{
Replica 1:

nsds50ruv: {replica 1 ldap://localhost.localdomain:389} 563ce9d5000200010000 5
63ce9e7000000010000
nsds50ruv: {replica 2 ldap://localhost.localdomain:7777} 563cfcc3000000020000
563cfcc3000100020000

Replica 2:

nsds50ruv: {replica 1 ldap://localhost.localdomain:389} 563ce9d5000200010000 5
63ce9e7000000010000
nsds50ruv: {replica 2 ldap://localhost.localdomain:7777} 563cfcc3000000020000
563cfcc3000100020000
}}}

When I turned on replication logging I could see that replication was failing because it thought the consumer(replica 2) had the same replica ID as replica 1. The server expects to see the local RUV element first in the list. So on replica 2, the local ruv element (replica 2) should be listed before (replica 1), but its not as we can see above.

The order should look like this below:

{{{
Replica 1:

nsds50ruv: {replica 1 ldap://localhost.localdomain:389} 563ce9d5000200010000 5
63ce9e7000000010000
nsds50ruv: {replica 2 ldap://localhost.localdomain:7777} 563cfcc3000000020000
563cfcc3000100020000

Replica 2:

nsds50ruv: {replica 2 ldap://localhost.localdomain:7777} 563cfcc3000000020000
563cfcc3000100020000
nsds50ruv: {replica 1 ldap://localhost.localdomain:389} 563ce9d5000200010000 5
63ce9e7000000010000
}}}

I have a fix for this, and I will be attaching a patch shortly.

Looks good to me.

I'd like to have one confirmation... Even if there are more than 2 servers involved, the api ruv_move_local_supplier_to_first(ruv, rid) guarantees to move itself at the top, there is no problem, right?

Also, if you promote multiple read only replicas to a master, again there is no problem, isn't there?

I propose to set the Milestone to 1.3.5.0. as well.

Thanks!

Replying to [comment:4 nhosoi]:

Looks good to me.

I'd like to have one confirmation... Even if there are more than 2 servers involved, the api ruv_move_local_supplier_to_first(ruv, rid) guarantees to move itself at the top, there is no problem, right?

Correct.

Also, if you promote multiple read only replicas to a master, again there is no problem, isn't there?

Nope, its what the function was designed to do. Not sure why it wasn't being called in this function. The code in this function also dramatically changed between 1.2.11 and 1.3.1. I have not been able to find the commit that added this code though, but I would call this a regression.

I propose to set the Milestone to 1.3.5.0. as well.

I propose this goes into 1.3.1, since that's where the problem seems have started.

Thanks!

Replying to [comment:5 mreynolds]:

I propose this goes into 1.3.1, since that's where the problem seems have started.

Agreed (the lowest version next to 1.2.11 is 1.3.2, so I set it to 1.3.2.28).
Thanks for your input and confirmation, Mark!

Thanks Noriko,

ab8ed9a..b896840 master -> master
commit b896840
Author: Mark Reynolds mreynolds@redhat.com
Date: Fri Nov 6 14:41:36 2015 -0500

8f49d33..6180b91 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 6180b91

a7a8b2d..2764bc4 389-ds-base-1.3.3 -> 389-ds-base-1.3.3
commit 2764bc4

98def84..378de6a 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 378de6a

Working on lib389 test script next...

Attached lib389 test script

Using the test script I get the following error.

{{{
topology = <ticket48325_test.topologyreplication object="" at="" 0x7f308d6d7e90="">

def test_ticket48325(topology):
    """
    Test that the RUV element order is correctly maintained when promoting
    a hub or consumer.
    """

    #
    # Promote consumer to master
    #
    try:
      DN = topology.consumer1.replica._get_mt_entry(DEFAULT_SUFFIX)

dirsrvtests/tickets/ticket48325_test.py:184:


self = <lib389.replica.replica object="" at="" 0x7f308d6ec950="">, suffix = 'dc=example,dc=com'

def _get_mt_entry(self, suffix):
    """Return the replica dn of the given suffix."""
    mtent = self.conn.mappingtree.list(suffix=suffix)
  return ','.join(("cn=replica", mtent.dn))

E AttributeError: 'list' object has no attribute 'dn'

../lib389/lib389/replica.py:33: AttributeError

}}}

This could be a bug in lib389 rather than the testscript however.

Replying to [comment:11 firstyear]:

Using the test script I get the following error.

{{{
topology = <ticket48325_test.topologyreplication object="" at="" 0x7f308d6d7e90="">

def test_ticket48325(topology):
    """
    Test that the RUV element order is correctly maintained when promoting
    a hub or consumer.
    """

    #
    # Promote consumer to master
    #
    try:
      DN = topology.consumer1.replica._get_mt_entry(DEFAULT_SUFFIX)

dirsrvtests/tickets/ticket48325_test.py:184:


self = <lib389.replica.replica object="" at="" 0x7f308d6ec950="">, suffix = 'dc=example,dc=com'

def _get_mt_entry(self, suffix):
    """Return the replica dn of the given suffix."""
    mtent = self.conn.mappingtree.list(suffix=suffix)
  return ','.join(("cn=replica", mtent.dn))

E AttributeError: 'list' object has no attribute 'dn'

../lib389/lib389/replica.py:33: AttributeError

}}}

This could be a bug in lib389 rather than the testscript however.

Do a "git pull" on your lib389 source tree - I committed a fix earlier today that addresses this error.

Test is all working for me. Ack.

b3a80f2..a534583 master -> master
commit a534583
Author: Mark Reynolds mreynolds@redhat.com
Date: Tue Nov 10 13:54:30 2015 -0500

1a6390d..d192435 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit d192435

2764bc4..7fca878 389-ds-base-1.3.3 -> 389-ds-base-1.3.3
commit 7fca878

Metadata Update from @mreynolds:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.2.28

2 years ago

Login to comment on this ticket.

Metadata