#47516 replication stops with excessive clock skew
Closed: Fixed None Opened 5 years ago by rmeggins.

If the CSN generator clock skew is over 1 day, replication stops. Users need to be able to continue to replicate with the high clock skew. There should be a configuration attr that allows replication to continue despite excessive clock skew.

This is becoming a much bigger problem now that many users are using VMs, which are notorious for having system clock/time/ntp issues.


It's okay since the default setting is OFF (#define LDAP_OFF 0) && (from comment "If there's no default value, the value will be NULL if it's not set in dse.ldif"), but it'd be nice to "initialize" the value in FrontendConfig_init like "cfg->ldapi_map_entries = LDAP_OFF;"?

To ssh://git.fedorahosted.org/git/389/ds.git
f513bc3..9dc7a46 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 9dc7a46
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 18 12:32:23 2013 -0600
bdca415..e61009e 389-ds-base-1.3.0 -> 389-ds-base-1.3.0
commit e61009e
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 18 12:32:23 2013 -0600
6829200..1f2c151 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 1f2c151
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 18 12:32:23 2013 -0600
c410b87..bde4372 master -> master
commit bde4372
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 18 12:32:23 2013 -0600

It's ok, but if replication continues the log skew will be logged again and again and float the error log.

An other option would be to make the max log skew configurable with the current 1d as default and eg -1 as no limit

The previous fix makes replication ignore time skew errors, but does not ensure that the CSN generator will continue to issue CSNs that exceed its built-in time skew limit. We need to make sure that the CSN generator will never issue duplicate CSNs or regress CSNs.

Another problem with the fix - it only handles the case where the supplier time skew is too great - it does not take into consideration the case where the consumer time skew is too great:
repl5_inc_protocol.c:
{{{
case EXAMINE_RUV_OK:
/ update our csn generator state with the consumer's ruv data /
dev_debug("repl5_inc_run(STATE_SENDING_UPDATES) -> examine_update_vector OK");
object_acquire(prp->replica_object);
replica = object_get_data(prp->replica_object);
rc = replica_update_csngen_state (replica, ruv);
object_release (prp->replica_object);
replica = NULL;
if (rc == CSN_LIMIT_EXCEEDED) / too much skew / {
slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name,
"%s: Incremental protocol: fatal error - too much time skew between replicas!\n",
agmt_get_long_name(prp->agmt));
next_state = STATE_STOP_FATAL_ERROR;
}}}

0001-Ticket-47516-replication-stops-with-excessive-clock-.patch
0001-Ticket-47516-replication-stops-with-excessive-clock-.patch

To ssh://git.fedorahosted.org/git/389/ds.git
962de25..d128dbd 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit d128dbd
Author: Rich Megginson rmeggins@redhat.com
Date: Thu Jan 16 12:57:22 2014 -0700
075a54e..b51a57b 389-ds-base-1.3.0 -> 389-ds-base-1.3.0
commit b51a57b20386e506a7eb484b62d39bf249ef995f
Author: Rich Megginson rmeggins@redhat.com
Date: Thu Jan 16 12:57:22 2014 -0700
7738016..51c1b2a 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 51c1b2a
Author: Rich Megginson rmeggins@redhat.com
Date: Thu Jan 16 12:57:22 2014 -0700
668903c..a6ec074 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit a6ec074
Author: Rich Megginson rmeggins@redhat.com
Date: Thu Jan 16 12:57:22 2014 -0700
9c41a36..9f2b104 master -> master
commit 9f2b104
Author: Rich Megginson rmeggins@redhat.com
Date: Thu Jan 16 12:57:22 2014 -0700

Metadata Update from @rmeggins:
- Issue assigned to rmeggins
- Issue set to the milestone: 1.2.11.23

2 years ago

Login to comment on this ticket.

Metadata