#49450 invalid management of connection refcnt
Closed: wontfix 5 years ago by mreynolds. Opened 7 years ago by tbordaz.

Issue Description

During IPA tests (TBD) we can see those logs

 [11/Nov/2017:00:07:38.030757299 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.032613873 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.034687686 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.040205852 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.043025214 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.045589499 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.047762366 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.054517942 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.056455317 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.057965134 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:38.060337289 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired
 [11/Nov/2017:00:07:39.066412555 +0100] - ERR - connection_release_nolock_ext - conn=2 fd=65 Attempt to release connection that is not acquired

Package Version and Platform

389-ds-base-1.3.7.7-1.fc26.x86_64
freeipa 4.6 git build

Steps to reproduce

  1. TBC

Actual results

Expected results


The failing connection looks like

$1 = {c_sb = 0x55b3f707a3f0, c_sd = 65, c_ldapversion = 3, c_dn = 0x55b3f70729e0 "cn=Directory Manager", c_isroot = 1, 
  c_isreplication_session = 0, c_authtype = 0x55b3f707c6b0 "OS", c_external_dn = 0x0, c_external_authtype = 0x7fa4642e92e4 "none", 
  cin_addr = 0x55b3f7076380, cin_destaddr = 0x55b3f7076460, c_domain = 0x0, c_ops = 0x55b3f82f9880, c_gettingber = 1, 
  c_currentber = 0x0, c_starttime = 1510355249, c_connid = 2, c_maxthreadscount = 0, c_maxthreadsblocked = 0, 
  c_opsinitiated = 204, c_opscompleted = 204, c_threadnumber = 1, c_refcnt = 0, c_mutex = 0x55b3f7bcda20, 
  c_pdumutex = 0x55b3f7bcdad0, c_idlesince = 1510355258, c_idletimeout = 0, c_idletimeout_handle = 8, c_private = 0x55b3f7c05440, 
  c_flags = 0, c_needpw = 0, c_client_cert = 0x0, c_prfd = 0x55b3f7c6b920, c_ci = 65, c_fdi = -1, c_next = 0x0, 
  c_prev = 0x55b3f7c13ee0, c_bi_backend = 0x0, c_extension = 0x55b3f7c6bcb0, c_sasl_conn = 0x55b3f61be800, c_local_ssf = 71, 
  c_sasl_ssf = 0, c_ssl_ssf = 0, c_unix_local = 1, c_local_valid = 1, c_local_uid = 0, c_local_gid = 0, c_pagedresults = {
    prl_maxlen = 0, prl_count = 0, prl_list = 0x0}, c_push_io_layer_cb = 0x0, c_pop_io_layer_cb = 0x0, c_io_layer_cb_data = 0x0, 
  c_ct = 0x55b3f6340a00, c_tp = 0x55b3f64fcc00, c_ns_close_jobs = 0, c_ipaddr = 0x55b3f707c6b8 "local"}

c_refcnt = 0
c_opsinitiated == c_opscompleted
c_gettingber = 1

It has still pending operations

(gdb) print *c_ops
$7 = {o_ber = 0x55b3f811fd10, o_msgid = 191, o_tag = 99, o_hr_time_rel = {tv_sec = 732, tv_nsec = 346471149}, o_hr_time_utc = {
    tv_sec = 1510355258, tv_nsec = 13765335}, o_isroot = 1, o_sdn = {flag = 10 '\n', udn = 0x55b3f82cfe80 "cn=Directory Manager", 
    dn = 0x55b3f801ca40 "cn=Directory Manager", ndn = 0x0, ndn_len = 20}, o_authtype = 0x55b3f824a968 "OS", o_ssf = 71, 
  o_opid = 190, o_connid = 2, o_handler_data = 0x0, o_result_handler = 0x0, o_search_entry_handler = 0x0, 
... 
... 14 operations received and processed (in access logs)
...
(gdb) print *c_ops->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next
$8 = {o_ber = 0x55b3f7f98640, o_msgid = 203, o_tag = 99, o_hr_time_rel = {tv_sec = 732, tv_nsec = 391863492}, o_hr_time_utc = {
    tv_sec = 1510355258, tv_nsec = 59157665}, o_isroot = 1, o_sdn = {flag = 10 '\n', udn = 0x55b3f8230960 "cn=Directory Manager", 
    dn = 0x55b3f82309e0 "cn=Directory Manager", ndn = 0x0, ndn_len = 20}, o_authtype = 0x55b3f81d18a0 "OS", o_ssf = 71, 
  o_opid = 202, o_connid = 2, o_handler_data = 0x0, o_result_handler = 0x0, o_search_entry_handler = 0x0, 
  o_search_referral_handler = 0x0, o_csngen_handler = 0x0, o_replica_attr_handler = 0x0, o_next = 0x55b3f85476c0, o_status = 3,

plus one not decoded (similar symptom as https://pagure.io/389-ds-base/issue/49410)

(gdb) print *the_connection_table->c->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_next->c_ops->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next->o_next
$9 = {o_ber = 0x55b3f81d6230, o_msgid = -1, o_tag = 18446744073709551615, o_hr_time_rel = {tv_sec = 732, tv_nsec = 398078409}, 
  o_hr_time_utc = {tv_sec = 1510355258, tv_nsec = 65372588}, o_isroot = 0, o_sdn = {flag = 0 '\000', udn = 0x0, dn = 0x0, 
    ndn = 0x0, ndn_len = 0}, o_authtype = 0x0, o_ssf = 0, o_opid = 203, o_connid = 2, o_handler_data = 0x0, 
  o_result_handler = 0x0, o_search_entry_handler = 0x0, o_search_referral_handler = 0x0, o_csngen_handler = 0x0, 
  o_replica_attr_handler = 0x0, o_next = 0x0, o_status = 0, o_searchattrs = 0x0, o_flags = 960, o_extension = 0x55b3f7e0f2c0,

In conclusion:
there is a connection in gettingber but with no worker
Its 14th last operations were completed but remaining linked to the connection

Metadata Update from @tbordaz:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

7 years ago

The following message were initially found during tests in DS and IPA with DS in 1.3.7
However we can see similar logs in 1.3.6 as well

Attempt to release connection that is not acquired

Metadata Update from @tbordaz:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1517383

7 years ago

Hmmm i'm not sure it is a duplicate though? 48184 is about at shutdown closing remaining connections, but this is trying to close a connection in an invalid state?

@firstyear I actually have the same doubt.
48184 clearly fixes the pb found (QE) at shutdown. But I remember having seen the msg "Attempt to release connection that is not acquired" in a context when there was no shutdown.

Now considering this is very difficult to debug and we are not 100% sure it is a different issue, I would propose to close it as duplicate and to reopen it if we can see some logs with this msg.

It would be nice for now to NOT log that error message if the server is shutting down (see slapi_is_shutting_down()). That way we only log that message in cases where it really is a problem.

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.7.0

7 years ago

Metadata Update from @mreynolds:
- Issue close_status updated to: invalid
- Issue status updated to: Closed (was: Open)

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2509

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: invalid)

4 years ago

Log in to comment on this ticket.

Metadata