#47350 Allow search to look up 'in memory RUV'
Closed: wontfix None Opened 9 years ago by tbordaz.

For a given replica, the ruv data structure is written back on a database entry with the DN "dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,<suffix>".

The database RUV is then look up by monitoring tools (like repl-monitor.pl or ldapsearch).
Since 1.2.8 (543633), see https://fedorahosted.org/389/ticket/564, in memory RUV is in sync with database RUV entry.

The fix for ticket 564, reintroduces a shift between in memory RUV/database RUV where in memory RUV reflects the exact state of the updates and database RUV is late ~30s.

This ticket is to create a new monitoring method to look up the in memory RUV.

We may use the replica_config_search (search on "cn=replica,cn="<suffix>",cn=mapping tree,cn=config") to handle that.

'''Here is the current status'''

  • I implemented a fix (under review). The in memory RUV access is done with the following command:

ldapsearch -LLL -h localhost -p 2011 -D "cn=directory manager" -w Secret123 -b "cn=replica,cn=\"dc=com\",cn=mapping tree,cn=config" cn=replica nsds50ruv
dn: cn=replica,cn=dc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replicageneration} 5183b5f8000000010000
nsds50ruv: {replica 1 ldap://pctbordaz.redhat.com:2011} 5183b755000000010000 5

  • On the test side. It ran successfully the replication acceptance.
    I did test with 3 masters/1slave and in memory RUV can be lookup on each of them.

I merged with https://fedorahosted.org/389/ticket/564, so that there are a transient shift between in_memory RUV and DB RUV. The shift is related to the periodicity of the thread flushing the RUV on DB. With the 'test_ruv' attached script I can monitor both RUV with the following output


=============== Fri May 3 15:58:27 CEST 2013 =====================

DB RUV= 183c26c000000010000
MEM RUV= 183c26c000000010000
modifying entry "uid=thierry,dc=com"

DB RUV= 183c26c000000010000
MEM RUV= 183c282000000010000

=============== Fri May 3 15:58:37 CEST 2013 =====================

DB RUV= 183c26c000000010000
MEM RUV= 183c282000000010000
modifying entry "uid=thierry,dc=com"

DB RUV= 183c26c000000010000
MEM RUV= 183c28c000000010000

=============== Fri May 3 15:59:07 CEST 2013 =====================

DB RUV= 183c28c000000010000
MEM RUV= 183c28c000000010000
modifying entry "uid=thierry,dc=com"

DB RUV= 183c28c000000010000
MEM RUV= 183c2aa000000010000


'''Here are the next steps'''

  • waiting for the review

sprintf(buffer, "%s %s", prefix_replicageneration, ruv->replGen);
We cannot use sprintf with fixed size buffers. Either use PR_snprintf, which will guarantee that the output buffer is properly NULL terminated, or just use slapi_ch_smprintf to create a malloced string, and use slapi_value_set_string_passin.
valuearray_add_value(&values, value);
valuearray_add_value expects that you will pass in value - so you will need to create a new slapi_value_new() each time you call valuearray_add_value.
ruv_element_to_string(ruv_e, NULL, buffer, sizeof (buffer));
If you pass in a struct berval *bv, and pass in buffer == NULL and size 0, it will use slapi_ch_smprintf to malloc the value, which you can then pass to slapi_value_set_string_passin.

I attached a third fix because
* need to also report nsruvReplicaLastModified (with nsds50ruv) to allow a full monitoring
* fix a bug in search_requested_attr where only the first attribute of the request was evaluated (sigh !!)

One minor change
value = slapi_value_new_string_passin(bv.bv_val);
use slapi_value_new_berval instead, and use slapi_ber_bvdone to free the berval, like this:
1228 ruv_element_to_string(ruv_e, &bv, NULL, 0);
1229 value = slapi_value_new_berval(&bv);
slapi_ber_bvdone(&bv); / slapi_value_new_berval makes a copy /
1230 valuearray_add_value(&values, value);
1231 slapi_value_free(&value);

Otherwise, looks good

Last attachment review

  • read in memory RUV from 'cn=replica,cn=\"<suffix\",cn=mapping tree,cn=config'
  • if db ruv is lookup 'nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,<suffix>', returns in memory RUV instead

Instead of parsing the entry dn manually, use slapi_sdn_get_parent e.g.

if (is_ruv_tombstone_entry(e)) {
Slapi_DN suffix_sdn = slapi_sdn_new();
slapi_sdn_get_parent(slapi_entry_get_sdn(e), suffix_sdn);
then change get_in_memory_ruv to use a Slapi_DN
instead of a char *.

Looks good to me.

One question:
search_requested_attr would always return TRUE when attrs[i] starts with attr. (e.g., attr[i]: testattribute, attr: testattr) Is it okay?

752 static PRBool
753 search_requested_attr(Slapi_PBlock pb, char attr)
762 for (i = 0; attrs[i] != NULL; i++) {
763 if (strncasecmp(attrs[i], attr, strlen(attr)) == 0) {
764 return PR_TRUE;
765 }
766 }

Hi Noriko,

Thank you very much for your review.
You are right search_requested_attr will returned TRUE if the starting part of the requested attribute matches a ruv attribute (nsds50ruv or nsruvReplicaLastModified). I fixed in the attached review.
Note that in that case, the erroneously requested attribute is later rejected as it is not found in the ruv entry.


'''git merge ticket47350'''

Updating 58234ad..6c26f0b
ldap/servers/plugins/replication/repl5.h | 2 ++
ldap/servers/plugins/replication/repl5_init.c | 3 ++-
ldap/servers/plugins/replication/repl5_plugins.c | 31 ++++++++++++++++++++++++++++
ldap/servers/plugins/replication/repl5_replica.c | 23 +++++++++++++++++----
ldap/servers/plugins/replication/repl5_replica_config.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ldap/servers/plugins/replication/repl5_ruv.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ldap/servers/plugins/replication/repl5_ruv.h | 2 ++
7 files changed, 208 insertions(+), 5 deletions(-)

'''git push origin master'''

Counting objects: 25, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (13/13), 3.89 KiB, done.
Total 13 (delta 11), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
58234ad..6c26f0b master -> master

commit 6c26f0b
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Thu May 2 15:24:13 2013 +0200

Thanks Rich, Thanks Noriko for your reviews and patience :)

Fix compilation warnings (thanks Mark !)

git merge warnings

Updating db67327..ab7bab0
ldap/servers/plugins/replication/repl5_replica_config.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

git push origin master

Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 1.02 KiB, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
db67327..ab7bab0 master -> master

commit ab7bab0
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Wed Jun 5 14:49:20 2013 +0200

Unfortunately, it seems this patch breaks the consumer initialization.
I set up 2way MMR and did consumer initialization, then consumer logs these errors:
[..] - slapi_start_bulk_import: bulk import is not supported by this (DSE) backend
[..] - ERROR bulk import abandoned
[..] - import userRoot: Thread monitoring returned: -23
[..] - import userRoot: Aborting all Import threads...
[..] - import userRoot: Import threads aborted.
[..] - import userRoot: Closing files...
[..] - libdb: BDB3028 userRoot/entryrdn.db: unable to flush: No such file or directory
[..] - import userRoot: Import failed.
[..] - process_bulk_import_op: NULL target sdn
And consumer's backend is wiped out (please note that there's no userRoot)

ls /var/lib/dirsrv/slapd-ID/db

DBVERSION __db.001 __db.002 __db.003 log.0000000001

More precisely, this commit:
commit 6c26f0b
Date: Thu May 2 15:24:13 2013 +0200
Ticket 47350 - Allow search to look up 'in memory RUV'

The cause is now a replica entry is being sent to the consumer as a part of the consumer initialization, which is out of the replicated backend and it aborts the whole consumer initialization. {{{ Breakpoint 3, decode_total_update_extop (pb=0x7fbcf117baa0, ep=0x7fbcf117b948) at ldap/servers/plugins/replication/repl5_total.c:762 762 if (ber_scanf(tmp_bere, "a", &str) == LBER_ERROR) (gdb) n 766 slapi_entry_set_dn(e, str); (gdb) p str $25 = 0x7fbcc800d5c0 "cn=replica,cn=dc\\3Dexample\\2Cdc\\3Dcom,cn=mapping tree,cn=config" <== THIS ENTRY IS NOT A PART OF CONSUMER INITIALIZATION. (gdb) bt #0 decode_total_update_extop (pb=0x7fbcf117baa0, ep=0x7fbcf117b948) at ldap/servers/plugins/replication/repl5_total.c:766 #1 0x00007fbcf6c8e1b1 in multimaster_extop_NSDS50ReplicationEntry ( pb=0x7fbcf117baa0) at ldap/servers/plugins/replication/repl5_total.c:859 #2 0x00007fbcfc1af968 in plugin_call_exop_plugins (pb=0x7fbcf117baa0, oid=0x7fbcc8005800 "2.16.840.1.113730.3.5.6") at ldap/servers/slapd/plugin.c:467 #3 0x000000000041e878 in do_extended (pb=0x7fbcf117baa0) at ldap/servers/slapd/extendop.c:364 [...] }}}

Here is the current status

  • The fix introduces a regression and I reproduced it
    Many thanks Noriko for having found it and nail it down to this bug fix !!

  • The RC of the regression is that the lookup of the DB ruv is now reporting the in memory RUV.
    THis is fine for monitoring that is only interested in 'nsds50ruv' and others attributes, but it breaks some features total update.

  • I made a fix and started replication acceptance on it.
    The fix consists to return the in memory RUV only for external operations. So far it is fine for all total updates (online, offline and task)

Here are the next steps

  • check acceptance result an post the review

git merge ticket47350
Updating 97ce027..383cc74
ldap/servers/plugins/replication/repl5_plugins.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

git push origin master
Enter passphrase for key '/home/tbordaz/.ssh/id_rsa_fedora':
Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 1.01 KiB, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
97ce027..383cc74 master -> master

commit 383cc74
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Thu Jun 6 14:59:24 2013 +0200

Metadata Update from @mreynolds:
- Issue assigned to tbordaz
- Issue set to the milestone: 1.3.2 - 06/13 (June)

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/687

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

2 years ago

Login to comment on this ticket.