Example: dn: ou=test,ou=projects,nsuniqueid=bd44c40a-afc711e3-9b7bd5e4-7027b69c +o=org246,ou=ou0,dc=example,dc=com
Trying to resurrect ou=test and adding a glue entry ou=projects. Then, the add fails due to its parent not found since o=org246 is a conflict.
210 create_glue_entry ( Slapi_PBlock pb, char sessionid, Slapi_DN dn, const char uniqueid, CSN opcsn ) 239 op_result= do_create_glue_entry(rdn, superiordn, uniqueid, "missingEntry", opcsn); 254 case LDAP_NO_SUCH_OBJECT: 255 / The parent is missing / 256 { 257 / JCMREPL - Create the parent ... recursion?... but what's the uniqueid? / 258 PR_ASSERT(0); / JCMREPL */ <== This assertion fails. 259 }
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1080185
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1080186
git patch file (389-ds-base-1.2.11) 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.patch
{{{ 380 if (buf->buf_prev_anchorcsn && csn_compare (buf->buf_prev_anchorcsn, anchorcsn) == 0 ) { 381 slapi_log_error(SLAPI_LOG_FATAL, buf->buf_agmt_name, 382 "Can't locate CSN %s in the changelog (DB rc=%d). " 383 "Unless the operation is delete, the consumer may need to be reinitialized.\n", 384 (char*)buf->buf_key.data, rc); }}}
I don't know how easy it will be for the admin to tell if this is a delete operation. We should try to only show this message if the operation is not a delete operation.
{{{
225 else if (LDAP_ALREADY_EXISTS == op_result) 226 { 227 slapi_log_error (SLAPI_LOG_FATAL, repl_plugin_name, 228 "%s: No need to turn tombstone %s to glue; it was already resurrected.\n", sessionid, addingdn); 229 op_result = LDAP_SUCCESS; 230 }
}}} Does this need to be FATAL?
{{{ 1250 if (e && e_in_cache) { 1251 CACHE_REMOVE(&inst->inst_cache, e); 1212 1252 cache_unlock_entry( &inst->inst_cache, e ); 1213 1253 CACHE_RETURN( &inst->inst_cache, &e ); }}} Do you need to add a CACHE_REMOVE here?
206 if (is_tombstone && 207 !PL_strncasecmp(dns[0], SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 208 (*(dns[0] + SLAPI_ATTR_UNIQUEID_LENGTH) == '=') && 209 (*(dns[0] + SLAPI_ATTR_UNIQUEID_LENGTH + slapi_uniqueIDSize() + 1/*=*/) == '\0')) {
}}}
Are you guaranteed that the length of dns[0] is greater than or equal to SLAPI_ATTR_UNIQUEID_LENGTH + slapi_uniqueIDSize() + 1? Same for slapi_rdn_set_dn_ext() and slapi_rdn_is_conflict()
git patch file (389-ds-base-1.2.11) -- Additional bug fixes... (Eventually, merged into the main patch) 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.2.patch
Still cache issues exist... {{{ ==30936== Thread 26: ==30936== Invalid read of size 8 ==30936== at 0x10466365: backentry_get_sdn (backentry.c:133) ==30936== by 0x104664D9: entry_same_dn (cache.c:167) ==30936== by 0x10466793: find_hash (cache.c:241) ==30936== by 0x10468A18: cache_find_dn (cache.c:1186) ==30936== by 0x1047D22E: dn2entry_ext (dn2entry.c:85) ==30936== by 0x104809CF: find_entry_internal_dn (findentry.c:130) ==30936== by 0x10480EAC: find_entry_internal (findentry.c:293) ==30936== by 0x10480F44: find_entry (findentry.c:313) ==30936== by 0x104C6CA9: ldbm_back_search (ldbm_search.c:545) ==30936== by 0x4CAE46A: op_shared_search (opshared.c:803) ==30936== by 0x13AE50: do_search (search.c:357) ==30936== by 0x11FBF6: connection_dispatch_operation (connection.c:622) ==30936== by 0x121584: connection_threadmain (connection.c:2339) ==30936== by 0x66D7C85: _pt_root (ptthread.c:204) ==30936== by 0x6D14D14: start_thread (pthread_create.c:308) ==30936== by 0x723253C: clone (clone.S:114) ==30936== Address 0x23075900 is 48 bytes inside a block of size 96 free'd ==30936== at 0x4A077E6: free (vg_replace_malloc.c:446) ==30936== by 0x4C5D8CF: slapi_ch_free (ch_malloc.c:363) ==30936== by 0x10466215: backentry_free (backentry.c:62) ==30936== by 0x104688B1: entrycache_return (cache.c:1157) ==30936== by 0x10468754: cache_return (cache.c:1130) ==30936== by 0x104BE176: modify_term (ldbm_modify.c:101) ==30936== by 0x104A1AC5: ldbm_back_add (ldbm_add.c:1209) ==30936== by 0x4C52D57: op_shared_add (add.c:681) ==30936== by 0x4C51D23: do_add (add.c:258) ==30936== by 0x11FAC4: connection_dispatch_operation (connection.c:579) ==30936== by 0x121584: connection_threadmain (connection.c:2339) ==30936== by 0x66D7C85: _pt_root (ptthread.c:204) ==30936== by 0x6D14D14: start_thread (pthread_create.c:308) ==30936== by 0x723253C: clone (clone.S:114) }}}
git patch file (389-ds-base-1.2.11) -- test passed with no leaks 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.3.patch
Need to fix this comment: {{{ / The entry is already a Tombstone, ignore this delete. / }}} We are not ignoring the delete now.
{{{ 217 op_result = urp_fixup_rename_entry (addingentry, conflictrdn, parentuniqueid, OP_FLAG_RESURECT_ENTRY|OP_FLAG_TOMBSTONE_ENTRY); }}} The previous place you call this with slapi_ch_strdup(parentuniqueid) - do you need to use strdup here too?
{{{ 220 dn = slapi_sdn_get_dn(sdn); 221 ldap_result_code = slapi_dn_syntax_check(pb, dn, 1); 220 ldap_result_code = slapi_dn_syntax_check(pb, slapi_sdn_get_dn(sdn), 1); }}} Now dn will not be set - is there code that expects dn to be set?
{{{ 317 LDAPDebug2Args(LDAP_DEBUG_BACKLDBM, "find_entry2modify_only returned NULL parententry pdn: %s, uniqueid: %s\n", 318 slapi_sdn_get_dn(&parentsdn), addr.uniqueid); }}} is it possible for addr.uniqueid to be NULL here?
{{{ 457 if (!PL_strncasecmp(edn, SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 458 ((edn + SLAPI_ATTR_UNIQUEID_LENGTH) == '=') && 459 ((edn + SLAPI_ATTR_UNIQUEID_LENGTH + slapi_uniqueIDSize() + 1/=/) == ',')) { }}}
Are you sure that edn is long enough? same here {{{ 206 PRUint32 uniqueidlen = SLAPI_ATTR_UNIQUEID_LENGTH + slapi_uniqueIDSize() + 1/=/; 207 if (is_tombstone && (strlen(dns[0]) == uniqueidlen) && 208 !PL_strncasecmp(dns[0], SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 209 ((dns[0] + SLAPI_ATTR_UNIQUEID_LENGTH) == '=')) { }}} and here {{{ 336 if (!PL_strncasecmp(dn, SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 337 ((mydn = (dn + SLAPI_ATTR_UNIQUEID_LENGTH)) == '=') && 338 ((mydn = (mydn + slapi_uniqueIDSize() + 1/=*/)) == ',')) { }}} and in slapi_rdn_is_conflict
Probably should make this a function so that you can reuse it in several places.
Replying to [comment:9 rmeggins]:
Need to fix this comment: {{{ / The entry is already a Tombstone, ignore this delete. / }}} We are not ignoring the delete now. {{{ / The entry is already a Tombstone, ignore this delete. / op_result= LDAP_ALREADY_EXISTS; }}} We ignore the delete once here in ldbm_back_delete... {{{ 228 rc = plugin_call_plugins(pb, SLAPI_PLUGIN_BE_PRE_DELETE_FN); 229 if (rc == -1) 230 { 231 / 232 * Plugin indicated some kind of failure, 233 * or that this Operation became a No-Op. 234 / 235 slapi_pblock_get(pb, SLAPI_RESULT_CODE, &ldap_result_code); 236 if (!ldap_result_code) { 237 if (LDAP_ALREADY_EXISTS == ldap_result_code) { 238 / 239 * The target entry is already a tombstone. 240 * We need to treat this as a success, 241 * but we need to remove the entry e from the entry cache. 242 / 243 remove_e_from_cache = 1; 244 ldap_result_code = LDAP_SUCCESS; 245 } 246 slapi_pblock_get(pb, SLAPI_RESULT_CODE, &ldap_result_code); 247 } 248 / restore original entry so the front-end delete code can free it / 249 slapi_pblock_get( pb, SLAPI_PLUGIN_OPRETURN, &opreturn ); 250 if (!opreturn) { 251 slapi_pblock_set( pb, SLAPI_PLUGIN_OPRETURN, ldap_result_code ? &ldap_result_code : &rc ); 252 } 253 / retval is -1 / 254 goto error_return; }}} {{{ 217 op_result = urp_fixup_rename_entry (addingentry, conflictrdn, parentuniqueid, OP_FLAG_RESURECT_ENTRY|OP_FLAG_TOMBSTONE_ENTRY); }}} The previous place you call this with slapi_ch_strdup(parentuniqueid) - do you need to use strdup here too? It's a bit ugly, but as commented, parentuniqueid is consumed once passed to urp_fixup_XXX_entry. 194 / addingentry and parentuniqueid are consumed, regardless of the result / So, the first time, I duplicated it and passed it to urp_fixup_add_entry. Next time, it's okay to be consumed. So, I passed parentuniqueid to urp_fixup_rename_entry directly. And when urp_fixup_rename_entry is no need to be called, parentuniqueid is freed. {{{ 220 dn = slapi_sdn_get_dn(sdn); 221 ldap_result_code = slapi_dn_syntax_check(pb, dn, 1); 220 ldap_result_code = slapi_dn_syntax_check(pb, slapi_sdn_get_dn(sdn), 1); }}} Now dn will not be set - is there code that expects dn to be set? Oh, I just go rid of the local variable "dn"... :) $ egrep "\<dn>" ldbm_add.c This returns no "char *dn". {{{ 317 LDAPDebug2Args(LDAP_DEBUG_BACKLDBM, "find_entry2modify_only returned NULL parententry pdn: %s, uniqueid: %s\n", 318 slapi_sdn_get_dn(&parentsdn), addr.uniqueid); }}} is it possible for addr.uniqueid to be NULL here? Ah, it might be... I'm going to fix it... {{{ 457 if (!PL_strncasecmp(edn, SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 458 ((edn + SLAPI_ATTR_UNIQUEID_LENGTH) == '=') && 459 ((edn + SLAPI_ATTR_UNIQUEID_LENGTH + slapi_uniqueIDSize() + 1/=/) == ',')) { }}} Are you sure that edn is long enough? same here {{{ 206 PRUint32 uniqueidlen = SLAPI_ATTR_UNIQUEID_LENGTH + slapi_uniqueIDSize() + 1/=/; 207 if (is_tombstone && (strlen(dns[0]) == uniqueidlen) && 208 !PL_strncasecmp(dns[0], SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 209 ((dns[0] + SLAPI_ATTR_UNIQUEID_LENGTH) == '=')) { }}} and here {{{ 336 if (!PL_strncasecmp(dn, SLAPI_ATTR_UNIQUEID, SLAPI_ATTR_UNIQUEID_LENGTH) && 337 ((mydn = (dn + SLAPI_ATTR_UNIQUEID_LENGTH)) == '=') && 338 ((mydn = (mydn + slapi_uniqueIDSize() + 1/=*/)) == ',')) { }}} and in slapi_rdn_is_conflict Probably should make this a function so that you can reuse it in several places. Yes, I should have. I'm turning it to a function and reuse it (with checking the length)...
Need to fix this comment: {{{ / The entry is already a Tombstone, ignore this delete. / }}} We are not ignoring the delete now. {{{ / The entry is already a Tombstone, ignore this delete. / op_result= LDAP_ALREADY_EXISTS; }}} We ignore the delete once here in ldbm_back_delete... {{{ 228 rc = plugin_call_plugins(pb, SLAPI_PLUGIN_BE_PRE_DELETE_FN); 229 if (rc == -1) 230 { 231 / 232 * Plugin indicated some kind of failure, 233 * or that this Operation became a No-Op. 234 / 235 slapi_pblock_get(pb, SLAPI_RESULT_CODE, &ldap_result_code); 236 if (!ldap_result_code) { 237 if (LDAP_ALREADY_EXISTS == ldap_result_code) { 238 / 239 * The target entry is already a tombstone. 240 * We need to treat this as a success, 241 * but we need to remove the entry e from the entry cache. 242 / 243 remove_e_from_cache = 1; 244 ldap_result_code = LDAP_SUCCESS; 245 } 246 slapi_pblock_get(pb, SLAPI_RESULT_CODE, &ldap_result_code); 247 } 248 / restore original entry so the front-end delete code can free it / 249 slapi_pblock_get( pb, SLAPI_PLUGIN_OPRETURN, &opreturn ); 250 if (!opreturn) { 251 slapi_pblock_set( pb, SLAPI_PLUGIN_OPRETURN, ldap_result_code ? &ldap_result_code : &rc ); 252 } 253 / retval is -1 / 254 goto error_return; }}}
{{{ 217 op_result = urp_fixup_rename_entry (addingentry, conflictrdn, parentuniqueid, OP_FLAG_RESURECT_ENTRY|OP_FLAG_TOMBSTONE_ENTRY); }}} The previous place you call this with slapi_ch_strdup(parentuniqueid) - do you need to use strdup here too? It's a bit ugly, but as commented, parentuniqueid is consumed once passed to urp_fixup_XXX_entry. 194 / addingentry and parentuniqueid are consumed, regardless of the result / So, the first time, I duplicated it and passed it to urp_fixup_add_entry. Next time, it's okay to be consumed. So, I passed parentuniqueid to urp_fixup_rename_entry directly. And when urp_fixup_rename_entry is no need to be called, parentuniqueid is freed.
{{{ 220 dn = slapi_sdn_get_dn(sdn); 221 ldap_result_code = slapi_dn_syntax_check(pb, dn, 1); 220 ldap_result_code = slapi_dn_syntax_check(pb, slapi_sdn_get_dn(sdn), 1); }}} Now dn will not be set - is there code that expects dn to be set? Oh, I just go rid of the local variable "dn"... :) $ egrep "\<dn>" ldbm_add.c This returns no "char *dn".
{{{ 317 LDAPDebug2Args(LDAP_DEBUG_BACKLDBM, "find_entry2modify_only returned NULL parententry pdn: %s, uniqueid: %s\n", 318 slapi_sdn_get_dn(&parentsdn), addr.uniqueid); }}} is it possible for addr.uniqueid to be NULL here? Ah, it might be... I'm going to fix it...
Probably should make this a function so that you can reuse it in several places. Yes, I should have. I'm turning it to a function and reuse it (with checking the length)...
Thank you, Rich!
diff from the previous patch (389-ds-base-1.2.11 branch) diffs.txt.1.2.11
git patch file (389-ds-base-1.2.11) -- merged patch 3 and the diff 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.4.patch
git patch file (master) -- ported from 1.2.11 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.5.patch
Looks good to me, nice work!
Reviewed by Mark (Thank you!!)
Pushed to 389-ds-base-1.2.11 branch 8f39dd5..9e5b333 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 9e5b333
Pushed to master 804d923..160cb3f master -> master commit 160cb3f
Pushed to 389-ds-base-1.3.2 b12c22b..3f933fe 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit 3f933fe
Pushed to 389-ds-base-1.3.1 85cc40c..68b52b7 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 68b52b72a4a682e5778938a3c780f4d5ff08ca0f
Looks like there may still be a problem - with 1.3.2.17: [27/Jun/2014:16:43:03 +0200] - Turning a tombstone into a tombstone! "fqdn=sauron.pipebreaker.pl+nsuniqueid=9453e301-793511e3-a4d9a4f9-faae4520,cn=computers,cn=accounts,dc=pipebreaker,dc=pl"
What if the entry that is to be deleted is a repl conflict entry of the form fqdn=sauron.pipebreaker.pl+nsuniqueid=9453e301-793511e3-a4d9a4f9-faae4520, but is not a tombstone entry?
I think slapi_is_special_rdn(rdn, RDN_IS_TOMBSTONE) doesn't work correctly in this case. It is saying that the above rdn is a tombstone, but it is not, it is a repl conflict.
Thank you, Rich. It is definitely a very bad bug.
Fix description: The logic in slapi_is_special_rdn to check if the RDN is a tombstone or not had a problem. Even if the first part of RDN is not nsuniqueid=<UNIQUE_ID>, it was returning true. In this patch, code to check the case is added and it returns false if the first part of RDN is not nsuniqueid=<UNIQUE_ID>.
We need to fix this bug on all the versions.
git patch file (master) 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.6.patch
If you have access to the entry, you should not call slapi_is_special_rdn to determine if an entry is a tombstone or a repl conflict. However, if all you have is the RDN, and cannot look at the objectclass or attributes of the entry, then slapi_is_special_rdn() is ok. Are there any places in the code where slapi_is_special_rdn is used, but we could use the actual entry instead?
find . -name "*.c" | xargs egrep -l slapi_is_special_rdn ./servers/slapd/rdn.c <== RDN only ./servers/slapd/back-ldbm/ldbm_delete.c <== entry is available. {{{ But I'd like to make sure the RDN is not starting with nsuniqueid=<UNIQUE_ID> here... 587 if (slapi_is_special_rdn(edn, RDN_IS_TOMBSTONE)) { 588 slapi_log_error(SLAPI_LOG_FATAL, "ldbm_back_delete", 589 "conn=%lu op=%d Turning a tombstone into a tombstone! \"%s\"; e: 0x %p, cache_state: 0x%x, refcnt: %d\n", 590 conn_id, op_id, edn, e, e->ep_state, e->ep_refcnt); 591 ldap_result_code= LDAP_OPERATIONS_ERROR; 592 retval = -1; 593 goto error_return; 594 } 595 tombstone_dn = compute_entry_tombstone_dn(edn, childuniqueid); }}} ./servers/slapd/back-ldbm/ldbm_entryrdn.c <== RDN only
ack
Thanks for reviewing the patch, Mark and Rich!
Pushed to master: e4b4419..286559d master -> master commit 286559d
Pushed to 389-ds-base-1.3.2: 7d19149..36b8830 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit 36b8830
Pushed to 389-ds-base-1.3.1: 18bc3e8..2f3852f 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 2f3852fea43efde5994b3805d65cf2fc80210d73
Pushed to 389-ds-base-1.2.11: ed48761..fe0b795 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit fe0b795
Issue in ldbm_delete.c
You still need to decrement the inst_ref_count:
{{{ @@ -1267,17 +1385,6 @@ common_return: plugin_call_plugins (pb, SLAPI_PLUGIN_BE_POST_DELETE_FN); }
Need to add this back as it causes any kind of database task to fail(it always thinks its busy because the ref count was never decremented):
{{{ if (inst->inst_ref_count) { slapi_counter_decrement(inst->inst_ref_count); } }}}
Thanks for your finding, Mark.
I understood this must be done regardless of the result. {{{ if (inst->inst_ref_count) { slapi_counter_decrement(inst->inst_ref_count); } }}}
How about this part? We should not unlock and return in the error case? {{{ - if (retval && e) { / error case / - cache_unlock_entry( &inst->inst_cache, e ); - CACHE_RETURN( &inst->inst_cache, &e ); - } }}}
Replying to [comment:26 nhosoi]:
Thanks for your finding, Mark. I understood this must be done regardless of the result. {{{ if (inst->inst_ref_count) { slapi_counter_decrement(inst->inst_ref_count); } }}} How about this part? We should not unlock and return in the error case? {{{ - if (retval && e) { / error case / - cache_unlock_entry( &inst->inst_cache, e ); - CACHE_RETURN( &inst->inst_cache, &e ); - } }}}
I'm not sure. This code was removed in patch5.
In my case there was no error, just a regular successful delete. I would assume that you removed the cache functions intentionally, so I would just add the "decrement ref count" back and nothing else.
Thanks for clarification, Mark! It seems we have to resurrect this code only... {{{ if (inst->inst_ref_count) { slapi_counter_decrement(inst->inst_ref_count); } }}}
git patch file (master) -- adding back accidentally removed slapi_counter_decrement. 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.7.patch
Thanks Noriko!
Thanks for the fix, Mark!
Pushed to master: 24a4103..788a48f master -> master commit 788a48f
Pushed to 389-ds-base-1.3.2: 36b8830..dfbc46a 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit dfbc46a
Pushed to 389-ds-base-1.3.1: 2f3852f..f18411d 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit f18411d98a441f26f1ef2246b62e84b72ce39ec4
Pushed to 389-ds-base-1.2.11: fe0b795..30f1ba8 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 30f1ba8
This bug "Ticket #47830 - usn tombstone entry not properly created" was introduced by the patch: 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.5.patch.
Reopening...
Pushed to master: 43c6ff2..98cf424 master -> master commit 98cf424
Pushed to 389-ds-base-1.3.2: 7666910..8c496b1 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit 8c496b1
Pushed to 389-ds-base-1.3.1: 0dd3192..51a76a4 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 51a76a4d565f0e93092e589ee7e3064d5371c54a
Pushed to 389-ds-base-1.2.11: 30f1ba8..5b76c17 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 5b76c17
git patch file (master) -- adding back the code to add entry back to cache if it was replaced. 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.8.patch
Fix incomplete backport to 1.3.1 & 1.2.11 0001-Ticket-47750-Fix-incomplete-backport-to-1.3.1-1.2.11.patch
Attached patch to fix memory leak introduced by incomplete backport.
3be0931..0b5f0d6 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 0b5f0d68233cec976d57597d1c810b79c62528a5
67e4eed..649e726 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 649e726
The previous patch that fixed the leak caused a crash, the new patch (part 2) correctly addresses the leak.
the VERSION.sh change should be in a separate commit, preferably the one that is tagged for the release.
Accessing bep without locking it is a red flag to me. Is there anywhere else in the code where bep is accessed without locking? If so, how is it ensured that it is safe to do so?
Replying to [comment:39 rmeggins]:
Yup sorry forgot that was in there.
This was just a partial backport of Noriko's fix from 47834, I'll look into this bep stuff shortly...
Replying to [comment:40 mreynolds]:
Replying to [comment:39 rmeggins]: the VERSION.sh change should be in a separate commit, preferably the one that is tagged for the release. Yup sorry forgot that was in there. Accessing bep without locking it is a red flag to me. Is there anywhere else in the code where bep is accessed without locking? If so, how is it ensured that it is safe to do so? This was just a partial backport of Noriko's fix from 47834, I'll look into this bep stuff shortly...
I'm not sure why Noriko added and then commented out the "PR_Lock" in the cache_is_in_cache() functions. I'm not sure why the "cache" structure is even passed back to those functions as it's not used. The entry that is passed to these functions is usually a fresh duplicate of the backentry(or it's cache entry has already been locked) - either way I see no need to lock (unless I'm missing something).
Replying to [comment:41 mreynolds]:
Replying to [comment:40 mreynolds]: Replying to [comment:39 rmeggins]: the VERSION.sh change should be in a separate commit, preferably the one that is tagged for the release. Yup sorry forgot that was in there. Accessing bep without locking it is a red flag to me. Is there anywhere else in the code where bep is accessed without locking? If so, how is it ensured that it is safe to do so? This was just a partial backport of Noriko's fix from 47834, I'll look into this bep stuff shortly... I'm not sure why Noriko added and then commented out the "PR_Lock" in the cache_is_in_cache() functions. I'm not sure why the "cache" structure is even passed back to those functions as it's not used. The entry that is passed to these functions is usually a fresh duplicate of the backentry(or it's cache entry has already been locked) - either way I see no need to lock (unless I'm missing something).
Ok. If it is not in the cache, and there is no way it could be referenced by another thread, then it is not necessary to lock it - but then, in that case, the ref count should never be greater than 1.
Replying to [comment:42 rmeggins]:
Replying to [comment:41 mreynolds]: Replying to [comment:40 mreynolds]: Replying to [comment:39 rmeggins]: the VERSION.sh change should be in a separate commit, preferably the one that is tagged for the release. Yup sorry forgot that was in there. Accessing bep without locking it is a red flag to me. Is there anywhere else in the code where bep is accessed without locking? If so, how is it ensured that it is safe to do so? This was just a partial backport of Noriko's fix from 47834, I'll look into this bep stuff shortly... I'm not sure why Noriko added and then commented out the "PR_Lock" in the cache_is_in_cache() functions. I'm not sure why the "cache" structure is even passed back to those functions as it's not used. The entry that is passed to these functions is usually a fresh duplicate of the backentry(or it's cache entry has already been locked) - either way I see no need to lock (unless I'm missing something). Ok. If it is not in the cache, and there is no way it could be referenced by another thread, then it is not necessary to lock it - but then, in that case, the ref count should never be greater than 1.
The tombstones are added tentatively to the cache, but in ldbm_add() if we resurrect a tombstone then it could be in the cache (and potentially removed by another thread?) Obviously the ref count can be greater than 1 in the resurrection case. I'm going to backport more of Noriko's caching fix(she's uses a monitor, instead of a mutex for the cache lock), and I'll add the locking back to the new cache_is_in_cache/refcnt functions.
test patch wiht partial backport from ticket 47834 0001-Ticket-47750-Entry-cache-part-2.patch
Looks good! I found one difference with the master branch. This is not an error case, so instead of -1, we return 1. {{{ … … entrycache_add_int(struct cache cache, struct backentry e, int state, 1350 1351 (*alt)->ep_refcnt++; 1351 1352 LOG("the entry %s already exists. returning existing entry %s (state: 0x%x)\n", 1352 1353 ndn, backentry_get_ndn(my_alt), state); 1354 cache_unlock(cache); 1355 return -1; 1353 1356 } }}}
git patch file (389-ds-base-1.2.11) -- Additional patch for the previous patch 0001-Ticket-47750-Entry-cache-part-2.patch 0002-Ticket-47750-Entry-cache-part-2.patch
git patch file (master) -- Following the 1.2.11 patch by Mark, enabiling cache lock in cache_is_in_cache and cache_has_otherref 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.9.patch
Pushed Mark's patch 000[12]-Ticket-47750-Entry-cache-part-2.patch to 389-ds-base-1.2.11 on behalf of him. 649e726..189eb8a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 2c16d7d commit 189eb8a
Pushed 0001-Ticket-47750-Creating-a-glue-fails-if-one-above-leve.9.patch to master: 66e43ae..5ae1307 master -> master commit 5ae1307
Pushed to 389-ds-base-1.3.3: 9fe0713..ef76678 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit ef76678
Pushed the equivalent patches to 1.3.1, as well: 7b7d092..154abc9 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 112203f49844bdc47649a6305b64d00485193671 commit 891d03a3d36fc46d3851eeaf2582bd3575fd5823 commit 44b66b4ceabc78c21ba48a7fe153915663dbc749 commit 154abc975af643ebecc70750eddf5921e2befc12
Refresh cache entry after calling the delete betxn postop plugins 0001-Ticket-47750-Need-to-refresh-cache-entry-after-calle.patch
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1174892
Why are we seeing this problem now? Seems like this should have been a problem for a long time.
Replying to [comment:51 rmeggins]:
The entry cache redesign that Noriko has done is quite new. So it is a regression from her recent work.
fbafee5..4a5eee6 master -> master commit 4a5eee6 Author: Mark Reynolds mreynolds@redhat.com Date: Tue Dec 16 10:37:02 2014 -0500
69df018..6164553 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 6164553
9851929..9295966 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit 9295966
250fc63..153e574 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 153e574ceeefec5af05a529b550c69441994ba2a
912deee..5ef3183 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 5ef3183
Do not refresh cache entry if it is a tomstone 0001-Ticket-47750-During-delete-operation-do-not-refresh-.patch
Attachment 0001-Ticket-47750-During-delete-operation-do-not-refresh-.patch added Do not refresh cache entry if it is a tomstone
Your fix looks good. Acked.
c3389a4..4ae6794 master -> master commit 4ae6794 Author: Mark Reynolds mreynolds@redhat.com Date: Thu Dec 18 09:08:52 2014 -0500
6e4e21f..f634732 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit f634732
9793fc4..6c4ad25 389-ds-base-1.3.2 -> 389-ds-base-1.3.2 commit 6c4ad25
153e574..f5458c4 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit f5458c4a03cc7a3a369a80999cf6643d34bb620b
3d69f15..83ecd45 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 83ecd45
Replying to [comment:58 tbordaz]:
A crash is possibly related to that fix/ticket I was using an old M1 instance containing entries. I updated with master build, restarted it. Lookup all entries, then issued a DEL. So the entry was in the entry cache. {{{ Program terminated with signal 11, Segmentation fault. 0 PR_ExitMonitor (mon=0x0) at ../../../nspr/pr/src/pthreads/ptsynch.c:547 547 if (!pthread_equal(mon->owner, self)) (gdb) where 0 PR_ExitMonitor (mon=0x0) at ../../../nspr/pr/src/pthreads/ptsynch.c:547 1 0x00007f6c259cb9c9 in cache_unlock_entry (cache=0x2903868, e=0x7f6bd8005330) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/cache.c:1568 2 0x00007f6c25a1786d in ldbm_back_delete (pb=0x7f6c0a7fbae0) at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/ldbm_delete.c:1275
A crash is possibly related to that fix/ticket
I was using an old M1 instance containing entries. I updated with master build, restarted it. Lookup all entries, then issued a DEL. So the entry was in the entry cache. {{{ Program terminated with signal 11, Segmentation fault.
547 if (!pthread_equal(mon->owner, self)) (gdb) where
at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/cache.c:1568
at ../workspaces/389-master-branch/ds/ldap/servers/slapd/back-ldbm/ldbm_delete.c:1275
Line 1275 in ldbm_delete.c lines up with cache_unlock_entry() from a previous patch. However, the latest patch fixes this exact crash. I believe your workspace needs a "git pull".
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.2.11.30
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1082
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.