#49354 total init fails if parentid > entryid
Closed: wontfix 6 years ago Opened 6 years ago by lkrispen.

The issue was fixed with ticket 48755, but after the fix for 49290 it fails again.


The patch for 48755 relied on the order of the ids in an search for (parentid>=1) and added the special flag SLAPI_OP_RANGE_NO_IDL_SORT to append an idlist witthout sorting the ids. The code is still there, but I suspect with the idlset mechanism we no longer execute it

Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

Metadata Update from @mreynolds:
- Issue assigned to firstyear

6 years ago

I'm not sure that's the issue. It looks like the beginreplicarefresh mechanism itself is broken. I think that multimaster_start is not being called, which means the dse callbacks aren't being set, so the trigger never occurs to run the refresh. Investigating why now.

Disregard that, different issue. I'll solve that first then come back to this,

So I went digging and I'm not sure it's idl_set related. The search operation here:

repl5_tot_protocol.c

526     slapi_search_internal_callback_pb(pb, &cb_data 

Is getting back an idl with the following data:

(gdb) print *candidates->b_ids@20
$38 = {2, 3, 4, 9, 11, 27, 14, 17, 19, 21, 23, 25, 12, 18, 20, 22, 24, 26, 30, 31}

Additionally, this goes directly to the range_candidates path, and never touches list_candidates.

This means the fault must be elsewhere in the replication code I think.

So I broken on send_entry in repl5_tot_protocol and I see:

Thread 60 "ns-slapd" hit Breakpoint 1, send_entry (e=0x6100004aa240, cb_data=0x7f03996e5bf0) at /home/william/development/389ds/ds/ldap/servers/plugins/replication/repl5_tot_protocol.c:762
762     int message_id = 0;
(gdb) print *e
$8 = {e_sdn = {flag = 14 '\016', udn = 0x60300103fd20 "ou=OU1,dc=example,dc=com", dn = 0x60300103fd50 "ou=OU1,dc=example,dc=com", ndn = 0x60300103fd80 "ou=ou1,dc=example,dc=com", ndn_len = 24}, e_srdn = {
    flag = 0 '\000', rdn = 0x6020007f9070 "ou=OU1", rdns = 0x0, butcheredupto = -1, nrdn = 0x0, all_rdns = 0x60300103fdb0, all_nrdns = 0x60300103fe10}, 
  e_uniqueid = 0x60400025bc10 "53960211-8ec911e7-8c879d9e-a71f06bc", e_dncsnset = 0x60300103fe70, e_maxcsn = 0x6020007f90d0, e_attrs = 0x60d000ec4a10, e_deleted_attrs = 0x0, e_virtual_attrs = 0x0, 
  e_virtual_watermark = 0, e_virtual_lock = 0x606000495860, e_extension = 0x6020007f9030, e_flags = 0 '\000', e_aux_attrs = 0x0}
(gdb) cont
Continuing.

Thread 60 "ns-slapd" hit Breakpoint 1, send_entry (e=0x610000612240, cb_data=0x7f03996e5bf0) at /home/william/development/389ds/ds/ldap/servers/plugins/replication/repl5_tot_protocol.c:762
762     int message_id = 0;
(gdb) print *e
$9 = {e_sdn = {flag = 6 '\006', udn = 0x0, dn = 0x6060005333c0 "cn=nsPwPolicyContainer,ou=OU0,ou=OU0,ou=OU1,dc=example,dc=com", 
    ndn = 0x606000533420 "cn=nspwpolicycontainer,ou=ou0,ou=ou0,ou=ou1,dc=example,dc=com", ndn_len = 61}, e_srdn = {flag = 0 '\000', rdn = 0x6030010fc9c0 "cn=nsPwPolicyContainer", rdns = 0x0, 
    butcheredupto = -1, nrdn = 0x0, all_rdns = 0x6040002eab50, all_nrdns = 0x0}, e_uniqueid = 0x6040002eab90 "53960204-8ec911e7-8c879d9e-a71f06bc", e_dncsnset = 0x6030010fcc00, e_maxcsn = 0x60200088b7b0, 
  e_attrs = 0x60d000f1c060, e_deleted_attrs = 0x0, e_virtual_attrs = 0x0, e_virtual_watermark = 0, e_virtual_lock = 0x606000533360, e_extension = 0x60200088b6f0, e_flags = 0 '\000', e_aux_attrs = 0x0}
(gdb) cont
Continuing.

Perhaps the fault is in the modrdn of the entries? They seem to be retaining their old DN's, or an incorrect one?

Okay, those rdns are fine. It looks like in id2entry:

id 12
    rdn: ou=OU0
    modifyTimestamp;adcsn-59a8da34000000010001;vucsn-59a8da34000000010001: 2017090
     1035532Z
    modifiersName;adcsn-59a8da34000000010000;vucsn-59a8da34000000010000: cn=direct
     ory manager
    ou;vucsn-59a8da34000000010001;mdcsn-59a8da34000000010000;vdcsn-59a8da34000000010000: 
     OU0
    objectClass;vucsn-59a8da1d000200010000: top
    objectClass;vucsn-59a8da1d000200010000: organizationalunit
    creatorsName;vucsn-59a8da1d000200010000: cn=directory manager
    createTimestamp;vucsn-59a8da1d000200010000: 20170901035509Z
    nsUniqueId: 53960202-8ec911e7-8c879d9e-a71f06bc
    entryid: 12
    numSubordinates: 6
    parentid: 13

id 13
    rdn: ou=OU0
    modifyTimestamp;adcsn-59a8da33000000010001;vucsn-59a8da33000000010001: 2017090
     1035531Z
    modifiersName;adcsn-59a8da33000000010000;vucsn-59a8da33000000010000: cn=direct
     ory manager
    ou;vucsn-59a8da33000000010001;mdcsn-59a8da33000000010000;vdcsn-59a8da33000000010000: 
     OU0
    objectClass;vucsn-59a8da1e000000010000: top
    objectClass;vucsn-59a8da1e000000010000: organizationalunit
    creatorsName;vucsn-59a8da1e000000010000: cn=directory manager
    createTimestamp;vucsn-59a8da1e000000010000: 20170901035510Z
    nsUniqueId: 53960203-8ec911e7-8c879d9e-a71f06bc
    entryid: 13
    tombstoneNumSubordinates: 5
    numSubordinates: 1
    parentid: 27

So we are missing ids 12 and 13 in the candidate set:

(gdb) print *candidates->b_ids@20
$38 = {2, 3, 4, 9, 11, 27, 14, 17, 19, 21, 23, 25, 12, 18, 20, 22, 24, 26, 30, 31}

They should be:

(gdb) print *candidates->b_ids@20
$38 = {2, 3, 4, 9, 11, 27, 13, 12, 14, 17, 19, 21, 23, 25, 12, 18, 20, 22, 24, 26, 30, 31}
                               ^-- here

So there probably is an issue with range search, but I don't think it's from idl_set.

Works with ldapsearch:

ldapsearch -H ldap://localhost:39001 ... '(parentid>=1)' parentid entryid
# OU0, OU0, OU1, example.com
dn: ou=OU0,ou=OU0,ou=OU1,dc=example,dc=com
parentid: 13
entryid: 12

# OU0, OU1, example.com
dn: ou=OU0,ou=OU1,dc=example,dc=com
parentid: 27
entryid: 13

# OU1, example.com
dn: ou=OU1,dc=example,dc=com
parentid: 1
entryid: 27

Ah ha! found it: it's in the idl_set patch. This diff is the issue, specifically the removal of the outer while loop:

1002 @@ -600,21 +605,16 @@ error:
1003          qsort((void *)&idl->b_ids[0], idl->b_nids, (size_t)sizeof(ID), idl_sort_cmp);
1004      }
1005      if (operator & SLAPI_OP_RANGE_NO_IDL_SORT) {
1006 -        int i;
1007 -        int left = leftovercnt;
1008 -        while (left) {
1009 -            for (i = 0; i < leftovercnt; i++) {
1010 -                if (leftover[i].key && idl_id_is_in_idlist(idl, leftover[i].key)) {
1011 -                    idl_rc = idl_append_extend(&idl, leftover[i].id);
1012 -                    if (idl_rc) {
1013 -                        slapi_log_err(SLAPI_LOG_ERR, "idl_new_range_fetch",
1014 -                            "Unable to extend id list (err=%d)\n", idl_rc);
1015 -                        idl_free(&idl);
1016 -                        return NULL;
1017 -                    }
1018 -                    leftover[i].key = 0;
1019 -                    left--;
1020 +        for (size_t i = 0; i < leftovercnt; i++) {
1021 +            if (leftover[i].key && idl_id_is_in_idlist(idl, leftover[i].key) == 0) {
1022 +                idl_rc = idl_append_extend(&idl, leftover[i].id);
1023 +                if (idl_rc) {
1024 +                    slapi_log_err(SLAPI_LOG_ERR, "idl_new_range_fetch",
1025 +                        "Unable to extend id list (err=%d)\n", idl_rc);
1026 +                    idl_free(&idl);
1027 +                    return NULL;
1028                  }
1029 +                leftover[i].key = 0;
1030              }

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to ack (was: None)

6 years ago

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2413

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata