Issue #49354: total init fails if parentid > entryid - 389-ds-base

389-ds-base

#49354 total init fails if parentid > entryid

Closed: wontfix 6 years ago Opened 6 years ago by lkrispen.

The issue was fixed with ticket 48755, but after the fix for 49290 it fails again.

The patch for 48755 relied on the order of the ids in an search for (parentid>=1) and added the special flag SLAPI_OP_RANGE_NO_IDL_SORT to append an idlist witthout sorting the ids. The code is still there, but I suspect with the idlset mechanism we no longer execute it

Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

Metadata Update from @mreynolds:
- Issue assigned to firstyear

6 years ago

firstyear commented 6 years ago

I'm not sure that's the issue. It looks like the beginreplicarefresh mechanism itself is broken. I think that multimaster_start is not being called, which means the dse callbacks aren't being set, so the trigger never occurs to run the refresh. Investigating why now.

firstyear commented 6 years ago

Disregard that, different issue. I'll solve that first then come back to this,

firstyear commented 6 years ago

So I went digging and I'm not sure it's idl_set related. The search operation here:

repl5_tot_protocol.c

526     slapi_search_internal_callback_pb(pb, &cb_data

Is getting back an idl with the following data:

(gdb) print *candidates->b_ids@20
$38 = {2, 3, 4, 9, 11, 27, 14, 17, 19, 21, 23, 25, 12, 18, 20, 22, 24, 26, 30, 31}

Additionally, this goes directly to the range_candidates path, and never touches list_candidates.

This means the fault must be elsewhere in the replication code I think.

firstyear commented 6 years ago

So I broken on send_entry in repl5_tot_protocol and I see:

Thread 60 "ns-slapd" hit Breakpoint 1, send_entry (e=0x6100004aa240, cb_data=0x7f03996e5bf0) at /home/william/development/389ds/ds/ldap/servers/plugins/replication/repl5_tot_protocol.c:762
762     int message_id = 0;
(gdb) print *e
$8 = {e_sdn = {flag = 14 '\016', udn = 0x60300103fd20 "ou=OU1,dc=example,dc=com", dn = 0x60300103fd50 "ou=OU1,dc=example,dc=com", ndn = 0x60300103fd80 "ou=ou1,dc=example,dc=com", ndn_len = 24}, e_srdn = {
    flag = 0 '\000', rdn = 0x6020007f9070 "ou=OU1", rdns = 0x0, butcheredupto = -1, nrdn = 0x0, all_rdns = 0x60300103fdb0, all_nrdns = 0x60300103fe10}, 
  e_uniqueid = 0x60400025bc10 "53960211-8ec911e7-8c879d9e-a71f06bc", e_dncsnset = 0x60300103fe70, e_maxcsn = 0x6020007f90d0, e_attrs = 0x60d000ec4a10, e_deleted_attrs = 0x0, e_virtual_attrs = 0x0, 
  e_virtual_watermark = 0, e_virtual_lock = 0x606000495860, e_extension = 0x6020007f9030, e_flags = 0 '\000', e_aux_attrs = 0x0}
(gdb) cont
Continuing.

Thread 60 "ns-slapd" hit Breakpoint 1, send_entry (e=0x610000612240, cb_data=0x7f03996e5bf0) at /home/william/development/389ds/ds/ldap/servers/plugins/replication/repl5_tot_protocol.c:762
762     int message_id = 0;
(gdb) print *e
$9 = {e_sdn = {flag = 6 '\006', udn = 0x0, dn = 0x6060005333c0 "cn=nsPwPolicyContainer,ou=OU0,ou=OU0,ou=OU1,dc=example,dc=com", 
    ndn = 0x606000533420 "cn=nspwpolicycontainer,ou=ou0,ou=ou0,ou=ou1,dc=example,dc=com", ndn_len = 61}, e_srdn = {flag = 0 '\000', rdn = 0x6030010fc9c0 "cn=nsPwPolicyContainer", rdns = 0x0, 
    butcheredupto = -1, nrdn = 0x0, all_rdns = 0x6040002eab50, all_nrdns = 0x0}, e_uniqueid = 0x6040002eab90 "53960204-8ec911e7-8c879d9e-a71f06bc", e_dncsnset = 0x6030010fcc00, e_maxcsn = 0x60200088b7b0, 
  e_attrs = 0x60d000f1c060, e_deleted_attrs = 0x0, e_virtual_attrs = 0x0, e_virtual_watermark = 0, e_virtual_lock = 0x606000533360, e_extension = 0x60200088b6f0, e_flags = 0 '\000', e_aux_attrs = 0x0}
(gdb) cont
Continuing.

Perhaps the fault is in the modrdn of the entries? They seem to be retaining their old DN's, or an incorrect one?

firstyear commented 6 years ago

Okay, those rdns are fine. It looks like in id2entry:

id 12
    rdn: ou=OU0
    modifyTimestamp;adcsn-59a8da34000000010001;vucsn-59a8da34000000010001: 2017090
     1035532Z
    modifiersName;adcsn-59a8da34000000010000;vucsn-59a8da34000000010000: cn=direct
     ory manager
    ou;vucsn-59a8da34000000010001;mdcsn-59a8da34000000010000;vdcsn-59a8da34000000010000: 
     OU0
    objectClass;vucsn-59a8da1d000200010000: top
    objectClass;vucsn-59a8da1d000200010000: organizationalunit
    creatorsName;vucsn-59a8da1d000200010000: cn=directory manager
    createTimestamp;vucsn-59a8da1d000200010000: 20170901035509Z
    nsUniqueId: 53960202-8ec911e7-8c879d9e-a71f06bc
    entryid: 12
    numSubordinates: 6
    parentid: 13

id 13
    rdn: ou=OU0
    modifyTimestamp;adcsn-59a8da33000000010001;vucsn-59a8da33000000010001: 2017090
     1035531Z
    modifiersName;adcsn-59a8da33000000010000;vucsn-59a8da33000000010000: cn=direct
     ory manager
    ou;vucsn-59a8da33000000010001;mdcsn-59a8da33000000010000;vdcsn-59a8da33000000010000: 
     OU0
    objectClass;vucsn-59a8da1e000000010000: top
    objectClass;vucsn-59a8da1e000000010000: organizationalunit
    creatorsName;vucsn-59a8da1e000000010000: cn=directory manager
    createTimestamp;vucsn-59a8da1e000000010000: 20170901035510Z
    nsUniqueId: 53960203-8ec911e7-8c879d9e-a71f06bc
    entryid: 13
    tombstoneNumSubordinates: 5
    numSubordinates: 1
    parentid: 27

So we are missing ids 12 and 13 in the candidate set:

(gdb) print *candidates->b_ids@20
$38 = {2, 3, 4, 9, 11, 27, 14, 17, 19, 21, 23, 25, 12, 18, 20, 22, 24, 26, 30, 31}

They should be:

(gdb) print *candidates->b_ids@20
$38 = {2, 3, 4, 9, 11, 27, 13, 12, 14, 17, 19, 21, 23, 25, 12, 18, 20, 22, 24, 26, 30, 31}
                               ^-- here

So there probably is an issue with range search, but I don't think it's from idl_set.

Edited 6 years ago by firstyear

firstyear commented 6 years ago

Works with ldapsearch:

ldapsearch -H ldap://localhost:39001 ... '(parentid>=1)' parentid entryid
# OU0, OU0, OU1, example.com
dn: ou=OU0,ou=OU0,ou=OU1,dc=example,dc=com
parentid: 13
entryid: 12

# OU0, OU1, example.com
dn: ou=OU0,ou=OU1,dc=example,dc=com
parentid: 27
entryid: 13

# OU1, example.com
dn: ou=OU1,dc=example,dc=com
parentid: 1
entryid: 27

firstyear commented 6 years ago

Ah ha! found it: it's in the idl_set patch. This diff is the issue, specifically the removal of the outer while loop:

1002 @@ -600,21 +605,16 @@ error:
1003          qsort((void *)&idl->b_ids[0], idl->b_nids, (size_t)sizeof(ID), idl_sort_cmp);
1004      }
1005      if (operator & SLAPI_OP_RANGE_NO_IDL_SORT) {
1006 -        int i;
1007 -        int left = leftovercnt;
1008 -        while (left) {
1009 -            for (i = 0; i < leftovercnt; i++) {
1010 -                if (leftover[i].key && idl_id_is_in_idlist(idl, leftover[i].key)) {
1011 -                    idl_rc = idl_append_extend(&idl, leftover[i].id);
1012 -                    if (idl_rc) {
1013 -                        slapi_log_err(SLAPI_LOG_ERR, "idl_new_range_fetch",
1014 -                            "Unable to extend id list (err=%d)\n", idl_rc);
1015 -                        idl_free(&idl);
1016 -                        return NULL;
1017 -                    }
1018 -                    leftover[i].key = 0;
1019 -                    left--;
1020 +        for (size_t i = 0; i < leftovercnt; i++) {
1021 +            if (leftover[i].key && idl_id_is_in_idlist(idl, leftover[i].key) == 0) {
1022 +                idl_rc = idl_append_extend(&idl, leftover[i].id);
1023 +                if (idl_rc) {
1024 +                    slapi_log_err(SLAPI_LOG_ERR, "idl_new_range_fetch",
1025 +                        "Unable to extend id list (err=%d)\n", idl_rc);
1026 +                    idl_free(&idl);
1027 +                    return NULL;
1028                  }
1029 +                leftover[i].key = 0;
1030              }

firstyear commented 6 years ago

mreynolds commented 6 years ago

tests now pass, ack

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to ack (was: None)

6 years ago

mreynolds commented 6 years ago

fbc3556..12e9f14 master -> master

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2413

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Metadata

Assignee

firstyear

Tags

None

Blocking

None

Depending on

None

Priority

None

Milestone

None

reviewstatus

ack

rhbz

None

origin

None

Attachments 1

0001-Ticket-49354-fix-regression-in-totinit-due-to...

Attached 6 years ago View Comment

389-ds-base

Source Code

#49354 total init fails if parentid > entryid Closed: wontfix 6 years ago Opened 6 years ago by lkrispen.

Metadata

Attachments 1

#49354 total init fails if parentid > entryid

Closed: wontfix 6 years ago Opened 6 years ago by lkrispen.