e9fa824 Bug 543633 - replication problems if supplier is killed under update load

Authored and Committed by nkinder 13 years ago
    Bug 543633 - replication problems if supplier is killed under update load
    
    This patch was provided by Ulf Weltman of HP.  It has been ported to the
    current 389 code.
    
    The RUV for each replica lives in-memory while the server is running and they
    are flushed to disk every 30 seconds.  After disorderly shutdown, this can
    cause two problems if updates were arriving from a client or another replica
    when slapd goes down:
    
    1) After starting back up, the RUV will frequently have a max CSN in the past
    as compared to the changelog and compared to remote replicas.  This means that
    any updates in the changelog that were not yet sent before the crash will
    continue to not be sent after slapd comes back up, until a new update arrives.
    Then the RUV will leap ahead, the incremental protocol will position replay at
    the remote replica's max CSN, and the unsent updates from before the crash and
    also the new update will be replayed.
    
    2) If slapd went down in the window between writing to the datastore and
    writing to the changelog, then the last update against the replica will never
    appear on remote replicas.  The incremental protocol will continue once tickled
    as described above, but the last change made before the crash will be missing
    and this is not detected by the protocol.
    
    My fix is to synchronize the writing of the RUV with the writing of the data
    store.  This is accomplished as follows:
    
    1) Add a function to the replication plugin that returns the required updates
    to the RUV for a given operation, as well as the unique ID of the RUV entry for
    convenience sake.  The function is registered by the replication plugin in a
    new field in the common parameter block.
    
    2) Add a callback handler in the backend functions that handle LDAP add,
    delete, modify and rename operations.  They check whether the parameter block
    has a RUV update handler registered and if so, call it.  If it gets the set of
    modifications back it will add the updates to the client update request.
    Note: The periodic RUV update thread is still needed, in order to write RUV to
    disk when a replica is first configured but no update has been made.
    
        
file modified
+7 -0
file modified
+1 -0