In a MMR environment where all the masters are under heavy load the replication changelog cache/buffer mechanism does not always use the correct anchor csn and some updates are not sent to the consumer.
Typically it's the very first "bulk load" read from the changelog at the start of a replication session that has issues, but it can also happen during subsequent bulk loads during the same session.
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1321124
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1321126
attachment 0001-Ticket-48766-Replication-changelog-can-miss-updates-.patch
The code look good to me, and you have my ack. I think that someone who knows a bit more about the csn internals should ack as well as I feel I may have missed something.
The patch looks good to me, too. You have my ack.
attachment 0001-reworked-clcach-buffer-code-following-design-at-http.patch
Ludwig, looks good to me, but of course I am biased since I've been working on it. One minor issue with indentation, there are "double tabs" around the extra repl logging I had added in cl5cache_initial_anchorcsn(), and in the cl5cache_adjust_anchorcsn() functions:
{{{ if (slapi_is_loglevel_set(SLAPI_LOG_REPL)) { <tab><tab> }}}
For now I'll leave this un-acked unless no one else reviews it...
Replying to [comment:9 mreynolds]:
Ludwig, looks good to me, but of course I am biased since I've been working on it. One minor issue with indentation,
indentation is a separate issue in the cl code, it requires some extra work
For now I'll leave this un-acked unless no one else reviews it... agreed, I would like to get Thierry's or Noriko's feedback and we should have the reliab15 tests run with it
Sorry for the late ACK. lot of the review happened on 389-devel
attachment 0001-Ticket-48766-Replication-changelog-can-incorrectly-s.patch
the latest patch just contains some cleanup of identation, should be ready for QA testing now
I also reviewed the latest patch: 0001-Ticket-48766-Replication-changelog-can-incorrectly-s.patch​ added
You have my ack, too. Thanks!
Reassigning to Ludwig since this is his fix.
Pushed to 1.3.4: 3c789a5..ec15a75 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit ec15a75
Indeed, it was straightforward to backport your patch to 1.2.11, Ludwig! just cherry-pick the commit worked. (I was somehow thinking it did not... )
Pushed to 1.2.11 66aed16..2acffca 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 2acffca
Hi Ludwig,
Is it ok to close this ticket?
Thanks!
we need to include another adjustment, the determining of the anchor csn should exclude replica ids where the consumer is more advanced
setting back to NEW
attachment 0001-PATCH-use-a-consumer-maxcsn-only-as-anchor-if-suppli.patch
attached is Thierry's version (thanks) of a fix to improve determining anchorcsn
Last long duration tests (with the last patch) show that replication was working fine, with all topology in sync and minor latency.
The patch looks good -> ack
committed to master
commit def469a Author: Thierry Bordaz tbordaz@redhat.com
and 1.3.5 commit f4301f6
will wait for reliab test with 6.8z before commit to 1.2.11
committed to 1.2.11
commit 40d0054
pushed also to 1.3.4
commit fba9d48
Metadata Update from @lkrispen: - Issue assigned to lkrispen - Issue set to the milestone: 1.2.11.33
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1826
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Log in to comment on this ticket.