#48766 Replication changelog can incorrectly skip over updates
Closed: Fixed None Opened 3 years ago by mreynolds.

In a MMR environment where all the masters are under heavy load the replication changelog cache/buffer mechanism does not always use the correct anchor csn and some updates are not sent to the consumer.

Typically it's the very first "bulk load" read from the changelog at the start of a replication session that has issues, but it can also happen during subsequent bulk loads during the same session.


The code look good to me, and you have my ack. I think that someone who knows a bit more about the csn internals should ack as well as I feel I may have missed something.

The patch looks good to me, too. You have my ack.

Ludwig, looks good to me, but of course I am biased since I've been working on it. One minor issue with indentation, there are "double tabs" around the extra repl logging I had added in cl5cache_initial_anchorcsn(), and in the cl5cache_adjust_anchorcsn() functions:

{{{
if (slapi_is_loglevel_set(SLAPI_LOG_REPL)) {
<tab><tab>
}}}

For now I'll leave this un-acked unless no one else reviews it...

Replying to [comment:9 mreynolds]:

Ludwig, looks good to me, but of course I am biased since I've been working on it. One minor issue with indentation,

indentation is a separate issue in the cl code, it requires some extra work

For now I'll leave this un-acked unless no one else reviews it...
agreed, I would like to get Thierry's or Noriko's feedback and we should have the reliab15 tests run with it

Sorry for the late ACK. lot of the review happened on 389-devel

the latest patch just contains some cleanup of identation, should be ready for QA testing now

I also reviewed the latest patch: 0001-Ticket-48766-Replication-changelog-can-incorrectly-s.patch​ added

You have my ack, too. Thanks!

Reassigning to Ludwig since this is his fix.

Pushed to 1.3.4:
3c789a5..ec15a75 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit ec15a75

Indeed, it was straightforward to backport your patch to 1.2.11, Ludwig!
just cherry-pick the commit worked. (I was somehow thinking it did not... )

Pushed to 1.2.11
66aed16..2acffca 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 2acffca

Hi Ludwig,

Is it ok to close this ticket?

Thanks!

we need to include another adjustment, the determining of the anchor csn should exclude replica ids where the consumer is more advanced

setting back to NEW

attached is Thierry's version (thanks) of a fix to improve determining anchorcsn

Last long duration tests (with the last patch) show that replication was working fine, with all topology in sync and minor latency.

The patch looks good -> ack

committed to master

commit def469a
Author: Thierry Bordaz tbordaz@redhat.com

and 1.3.5
commit f4301f6

will wait for reliab test with 6.8z before commit to 1.2.11

committed to 1.2.11

commit 40d0054

pushed also to 1.3.4

commit fba9d48

Metadata Update from @lkrispen:
- Issue assigned to lkrispen
- Issue set to the milestone: 1.2.11.33

2 years ago

Login to comment on this ticket.

Metadata