#49591 Inconsistencies in complex conflict cases and failure to do cleanup after tests
Closed: wontfix 6 months ago by mreynolds. Opened 2 years ago by lkrispen.

follow up of ticket 49551 about
- inconsistencies reported by repl diff tool
- failure to cleanup after testcases for ticket 49043


Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1546101
- Custom field type adjusted to None
- Custom field version adjusted to None

2 years ago

I'm going to test your patch later today - note there are a bunch of tabs in urp_tombstone.c that will need to be removed

yes, I noticed th etabs, but they are there from a previous patch. Will cleanup before commit

Running the full 3 master test for conflict_resolve_test does not report any issues. Great!!!

But running the two master test does report a number of inconsistencies:

# py.test -vv  conflict_resolve_test.py -k "TwoMaster"

---> This test fails on the memberOf test (fyi)

#  ds-replcheck -D "cn=directory manager" -w password -m ldap://localhost:39001 -r ldap://localhost:39002 -b dc=example,dc=com  
...
...
Conflict Entries
=====================================================

Master Conflict Entries:  43
Replica Conflict Entries: 42

Missing Entries
=====================================================

  Entries missing on Replica:
   - nsuniqueid=0576940c-1d7011e8-82339b9f-74a2aebb,uid=test_user_1121,cn=test_container,dc=example,dc=com  (Created on Master at: Thu Mar  1 16:46:15 2018)
   - cenotaphID=0576940c-1d7011e8-82339b9f-74a2aebb+uid=test_user_1101,cn=test_container,dc=example,dc=com  (Created on Master at: Thu Mar  1 16:46:56 2018)
   - nsuniqueid=70c06624-1d7011e8-82339b9f-74a2aebb,cn=c0,cn=p0,cn=sub7,cn=test_container,dc=example,dc=com  (Created on Master at: Thu Mar  1 16:49:28 2018)
   - nsuniqueid=70c0661e-1d7011e8-bc9fc0a5-56631cb4,cn=p0+nsuniqueid=70c0661e-1d7011e8-bc9fc0a5-56631cb4,cn=sub11,cn=test_container,dc=example,dc=com  (Created on Master at: Thu Mar  1 16:49:37 2018)

  Entries missing on Master:
   - nsuniqueid=0576940c-1d7011e8-82339b9f-74a2aebb,uid=test_user_1101,cn=test_container,dc=example,dc=com  (Created on Replica at: Thu Mar  1 16:46:15 2018)
   - nsuniqueid=70c06618-1d7011e8-bc9fc0a5-56631cb4,cn=c0,cn=p0,cn=sub7,cn=test_container,dc=example,dc=com  (Created on Replica at: Thu Mar  1 16:49:29 2018)
   - nsuniqueid=70c0661e-1d7011e8-bc9fc0a5-56631cb4,cn=p0,cn=sub11,cn=test_container,dc=example,dc=com  (Created on Replica at: Thu Mar  1 16:49:37 2018)
   - nsuniqueid=70c06624-1d7011e8-82339b9f-74a2aebb,cn=c0,nsuniqueid=70c06623-1d7011e8-82339b9f-74a2aebb,cn=p0,cn=sub7,cn=test_container,dc=example,dc=com  (Created on Replica at: Thu Mar  1 16:49:28 2018)

Attaching beta version (1.3) of ds-replcheck tool

ds-replcheck

I should also mention that I actually run this command so I don't lose the instances after the test stops

DEBUGGING=true py.test conflict_resolve_test.py -k "TwoMaster"

I also used a small patch for the test case from Simon. Attaching below:

0001-Remove-tearDown.patch

I finally understood the failure of the memberof tests. It is a consequence of two issues not related to conflicts: #49040 and #49040
If replication is enabled again we can run into a deadlock accessing the changelog and the incoming operation is aborted (#49040). Unfortunately because of #49090 there is still a conflict entry in the entry cache and the retry of the operation is creating an inconsistency.

When changing the policy to resolve deadlocks in BDB the memberof test always passed

0001-change-deadlock-resolving-policy.patch

patch to change deadlock policy in test

Is this something we should be setting by default in the server? I know we are going away from BDB, but it almost seems like it should be set though. Perhaps there are performance impacts? Anyway just curious.

Ack for the test patch!

Is this something we should be setting by default in the server?

yes, that is something I also thought about, I know German has resolved some customer issues by changing the db lock policy to 6.

Our default setting is LOCK_YOUNGEST, where the thread who took the lock later is aborted, the setting "6" I chose is LOCK_MIN_WRITE, where teh thread with ten min pages write locked is aborted.
In our scenario the replication thread is only reading the changelog an will now be aborted and the main operation thread can continue.

I think this could be the default since deadlocks between writing threads to the db should be prevented by the backend lock and if we would have deadlocks between a search and mod aborting the search also should be ok

Unfortunately the LOCK_MIN_WRITE can skip changes when accessing the cl is aborted and retried,
I'll attach a patch which fixes this. It should probably be a different ticket though.

0001-Do-not-skip-change-when-retrying-aborted-transaction.patch

The urp_tombstone file contains lot of tabs, I would like to avoid a mixture of formatting and fixiing patches (which will then be harder to find).

I'd like to commit the attached formatting patch (no code change) first and then rebase my fixes

0001-missing-clang-formatting-in-urp_tombstone.patch

Yes please commit this patch! So these "tabs" were recently added from one of the previous patches for this ticket?

Yes please commit this patch! So these "tabs" were recently added from one of the previous patches for this ticket?

no, they are there since 2017/09/05 from the commit for 49043, maybe I did mistake when rebasing the original conflict patches to the formatted code base

To ssh://pagure.io/389-ds-base.git
5e3f428..02a8def master -> master

To ssh://pagure.io/389-ds-base.git
ce88dd0..81a9d75 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

Metadata Update from @lkrispen:
- Issue assigned to lkrispen

2 years ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.7.0

2 years ago

improved version of the consistency patch
0001-Ticket-49591-Improve-conflict-consistency-in-complex.patch

@lkrispen - what is the state of this issue? Was the patch above needed?

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.2 (was: 1.3.7.0)

6 months ago

These patches were addressing some artificial test scenarios. The conflict code is released for some time and we didn't run into these issues. Adding the patch could introduce new behaviour and risks and so I would suggest to say this is a corner case and close as WONTFIX

Metadata Update from @mreynolds:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

6 months ago

Login to comment on this ticket.