#48189 Connections hanging for unindexed searches
Closed: wontfix None Opened 8 years ago by gerdx.

Doing an unindexed search with filter (uid=ab) the connection hang until the 389 Directory Server closes the connection after a long time (40 minutes or more) with error 11 (Resource temporarily unavailable).

After setting nsslapd-errorlog-level to 1 the following output was written to the error log:

389-Directory/1.3.3.10 B2015.141.1211
...
[30/May/2015:16:34:57 +0200] id2entry - => id2entry(498)
[30/May/2015:16:34:57 +0200] id2entry - <= id2entry 34c1510, dn "uid=kenz,ou=people,dc=ukl,dc=uni-freiburg,dc=de" (cache)
[30/May/2015:16:34:57 +0200] id2entry - <= id2entry( 498 ) 34c1510 (disk)
[30/May/2015:16:34:57 +0200] - re_exec (kenz) 0
[30/May/2015:16:34:57 +0200] id2entry - => id2entry(499)
[30/May/2015:16:34:57 +0200] id2entry - <= id2entry 19752c0, dn "uid=martens,ou=people,dc=ukl,dc=uni-freiburg,dc=de" (cache)
[30/May/2015:16:34:57 +0200] id2entry - <= id2entry( 499 ) 19752c0 (disk)
[30/May/2015:16:34:57 +0200] - --> pagedresults_is_timedout
[30/May/2015:16:34:57 +0200] - <-- pagedresults_is_timedout: -
[30/May/2015:16:34:57 +0200] - re_exec (martens) 0
[30/May/2015:16:34:57 +0200] id2entry - => id2entry(500)
[30/May/2015:16:34:57 +0200] id2entry - <= id2entry 7f0c5c3711b0, dn "uid=martinj,ou=people,dc=ukl,dc=uni-freiburg,dc=de" (cache)
[30/May/2015:16:34:57 +0200] id2entry - <= id2entry( 500 ) 7f0c5c3711b0 (disk)
[30/May/2015:16:34:57 +0200] - re_exec (martinj) 0
[30/May/2015:16:34:57 +0200] - --> pagedresults_is_timedout
[30/May/2015:16:34:57 +0200] - <-- pagedresults_is_timedout: -
[30/May/2015:16:34:58 +0200] - --> pagedresults_is_timedout
[30/May/2015:16:34:58 +0200] - <-- pagedresults_is_timedout: -
[30/May/2015:16:34:58 +0200] - --> pagedresults_is_timedout
[30/May/2015:16:34:58 +0200] - <-- pagedresults_is_timedout: -
...

nsslapd-lookthroughlimit was set to 500. After looking through 500 entries the 389 Directory Server stopped looking through the entries and called "pagedresults_is_timedout" for a long time (40 minutes or more).

Why does the 389 Directory Server not sent an error message and disconnects immediately?


How many entries do you have?

I'm changing the priority to "critical" instead of blocker.

There are about 40000 entries in our database userRoot, which is used for the search. In our separate database logData there are about 1400000 entries (1,4 million entries). Our database logData has a separate suffix - not a sub-suffix of our database userRoot. So our database logData is not searched because the suffix of our database userRoot is used for the search.

In the meantime I installed 389 Directory Server 1.3.2.27. I made the same configurations as for 389 Directory Server 1.3.3.10 an I importeted the same data. With 389 Directory Server 1.3.2.27 the search with filter (uid=ab) works as expectet. First some entries are given back and then the Error "Administrative limit exceeded" is given back. All this happens within a few seconds. But with 389 Directory Server 1.3.3.10 the connection hang for about one hour after some entries were given back.

I just tested the latest code from the master branch, and the latest source code from 1.3.3, and it works correctly.

[02/Jun/2015:11:44:44 -0400] conn=22 fd=64 slot=64 connection from 127.0.0.1 to 127.0.0.1
[02/Jun/2015:11:44:44 -0400] conn=22 op=0 BIND dn="" method=128 version=3
[02/Jun/2015:11:44:44 -0400] conn=22 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[02/Jun/2015:11:44:44 -0400] conn=22 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(uid=a)" attrs="distinguishedName"
[02/Jun/2015:11:44:44 -0400] conn=22 op=1 RESULT err=11 tag=101 nentries=0 etime=0 notes=A
[02/Jun/2015:11:44:44 -0400] conn=22 op=2 UNBIND
[02/Jun/2015:11:44:44 -0400] conn=22 op=2 fd=64 closed - U1

Note, I am '''not''' binding as the "Directory Manager", as this account ignores resource limits. If this was a regression it looks like it might have been fixed already.

When I bind with "Directory Manager" for the search all searched entries were given back as expected. For an anonymous Bind a part of the searched entries were given back (limited by the resource limit - as expected) but then the connection hang until the client disconnects.

[03/Jun/2015:09:28:48 +0200] conn=1 fd=50 slot=50 connection from ::1 to ::1
[03/Jun/2015:09:28:48 +0200] conn=1 op=0 BIND dn="cn=Directory Manager" method=128 version=3
[03/Jun/2015:09:28:48 +0200] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager"
[03/Jun/2015:09:28:48 +0200] conn=1 op=1 SRCH base="dc=ukl,dc=uni-freiburg,dc=de" scope=2 filter="(uid=ab)" attrs=ALL
[03/Jun/2015:09:28:53 +0200] conn=1 op=1 RESULT err=0 tag=101 nentries=250 etime=5 notes=A
[03/Jun/2015:09:28:53 +0200] conn=1 op=2 UNBIND
[03/Jun/2015:09:28:53 +0200] conn=1 op=2 fd=50 closed - U1
[03/Jun/2015:09:30:35 +0200] conn=2 fd=50 slot=50 connection from 193.196.192.64 to 10.192.2.28
[03/Jun/2015:09:30:35 +0200] conn=2 op=0 SRCH base="dc=ukl,dc=uni-freiburg,dc=de" scope=2 filter="(uid=ab)" attrs=ALL
[03/Jun/2015:13:17:22 +0200] conn=2 op=-1 fd=50 closed - B1

The test above was done with 389-ds-base-1.3.3.10.tar.bz2 that I downloaded from http://directory.fedoraproject.org and then compiled.

Which version did you use for your test?

For the anonymous Bind above there is no log output ... BIND dn="" ... as shown in your log output. Is that depending on configuration?

Below the log output for the test with 389-ds-base-1.3.2.27.tar.bz2 is shown. Here the search for the anonymous Bind works fine.

[03/Jun/2015:13:21:22 +0200] conn=1 fd=50 slot=50 connection from ::1 to ::1
[03/Jun/2015:13:21:22 +0200] conn=1 op=0 BIND dn="cn=Directory Manager" method=128 version=3
[03/Jun/2015:13:21:22 +0200] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager"
[03/Jun/2015:13:21:22 +0200] conn=1 op=1 SRCH base="dc=ukl,dc=uni-freiburg,dc=de" scope=2 filter="(uid=ab)" attrs=ALL
[03/Jun/2015:13:21:29 +0200] conn=1 op=1 RESULT err=0 tag=101 nentries=250 etime=7 notes=A
[03/Jun/2015:13:21:29 +0200] conn=1 op=2 UNBIND
[03/Jun/2015:13:21:29 +0200] conn=1 op=2 fd=50 closed - U1
[03/Jun/2015:13:22:27 +0200] conn=2 fd=50 slot=50 connection from 193.196.192.64 to 10.192.2.28
[03/Jun/2015:13:22:27 +0200] conn=2 op=0 SRCH base="dc=ukl,dc=uni-freiburg,dc=de" scope=2 filter="(uid=ab)" attrs=ALL
[03/Jun/2015:13:22:27 +0200] conn=2 op=0 RESULT err=11 tag=101 nentries=56 etime=0 notes=A
[03/Jun/2015:13:22:27 +0200] conn=2 op=1 UNBIND
[03/Jun/2015:13:22:27 +0200] conn=2 op=1 fd=50 closed - U1

Note: you could safely ignore this trace log.
{{{
[..] - --> pagedresults_is_timedout
[..] - <-- pagedresults_is_timedout: -
}}}
It is confusing but '-' means NOT timedout... (Probably, I should change it to "false" or something better.

'-' is ok I think. The problem is that the connection hangs for version 1.3.3.10.

Below I include traces for version 1.3.2.27 and for version 1.3.3.10.

For version 1.3.2.27 all works as expected.

For version 1.3.3.10 the connection hangs and "send_ldap_result 11" is missing.

Trace for version 1.3.2.27:
[04/Jun/2015:18:26:08 +0200] id2entry - => id2entry(5000)
[04/Jun/2015:18:26:08 +0200] id2entry - <= id2entry 643d160, dn "uid=49343,ou=people,dc=ukl,dc=uni-freiburg,dc=de" (cache)
[04/Jun/2015:18:26:08 +0200] id2entry - <= id2entry( 5000 ) 643d160 (disk)
[04/Jun/2015:18:26:08 +0200] - re_exec (49343) 0
[04/Jun/2015:18:26:08 +0200] - => send_ldap_result 11::
[04/Jun/2015:18:26:08 +0200] - <= send_ldap_result
[04/Jun/2015:18:26:08 +0200] - add_work_q
[04/Jun/2015:18:26:08 +0200] - get_work_q
[04/Jun/2015:18:26:08 +0200] - --> pagedresults_in_use
[04/Jun/2015:18:26:08 +0200] - <-- pagedresults_in_use: 0
[04/Jun/2015:18:26:08 +0200] - do_unbind
[04/Jun/2015:18:26:08 +0200] - => get_ldapmessage_controls
[04/Jun/2015:18:26:08 +0200] - <= get_ldapmessage_controls no controls
[04/Jun/2015:18:26:08 +0200] - defbackend_noop
[04/Jun/2015:18:26:08 +0200] - --> pagedresults_reset_timedout
[04/Jun/2015:18:26:08 +0200] - <-- pagedresults_reset_timedout: 0
[04/Jun/2015:18:26:08 +0200] - --> pagedresults_cleanup
[04/Jun/2015:18:26:08 +0200] - <-- pagedresults_cleanup: 0
[04/Jun/2015:18:26:18 +0200] - ldbm backend flushing
[04/Jun/2015:18:26:18 +0200] - ldbm backend done flushing
[04/Jun/2015:18:26:18 +0200] - ldbm backend flushing
[04/Jun/2015:18:26:18 +0200] - ldbm backend done flushing
[04/Jun/2015:18:26:48 +0200] - ldbm backend flushing
[04/Jun/2015:18:26:48 +0200] - ldbm backend done flushing

Trace for version 1.3.3.10:
[04/Jun/2015:18:39:59 +0200] id2entry - => id2entry(5000)
[04/Jun/2015:18:39:59 +0200] entryrdn-index - --> entryrdn_lookup_dn
[04/Jun/2015:18:39:59 +0200] entryrdn-index - <-- entryrdn_lookup_dn
[04/Jun/2015:18:39:59 +0200] - => str2entry_fast
[04/Jun/2015:18:39:59 +0200] - <= str2entry_fast 0x6350d10
[04/Jun/2015:18:39:59 +0200] id2entry - id2entry id: 5000, dn "uid=49343,ou=people,dc=ukl,dc=uni-freiburg,dc=de" -- adding it to cache
[04/Jun/2015:18:39:59 +0200] - slapi_pw_get_entry_ext: pw_entry_extension is not set
[04/Jun/2015:18:39:59 +0200] id2entry - <= id2entry( 5000 ) 63535c0 (disk)
[04/Jun/2015:18:39:59 +0200] - re_exec (49343) 0
[04/Jun/2015:18:39:59 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:39:59 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:39:59 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:00 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:00 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:00 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:00 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:00 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:00 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:00 +0200] - <-- pagedresults_is_timedout: -
...
(repeating many times)
...
[04/Jun/2015:18:40:16 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:16 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:16 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:16 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:16 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:16 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:16 +0200] - ldbm backend flushing
[04/Jun/2015:18:40:16 +0200] - ldbm backend done flushing
[04/Jun/2015:18:40:16 +0200] - ldbm backend flushing
[04/Jun/2015:18:40:16 +0200] - ldbm backend done flushing
[04/Jun/2015:18:40:17 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:17 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:17 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:17 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:17 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:17 +0200] - <-- pagedresults_is_timedout: -
[04/Jun/2015:18:40:17 +0200] - --> pagedresults_is_timedout
[04/Jun/2015:18:40:17 +0200] - <-- pagedresults_is_timedout: -
...
(repeating many times)
...

We built 1.3.3.11 for Fedora 21 and 22.
https://admin.fedoraproject.org/updates/389-ds-base-1.3.3.11-1.{fc21,fc22}
The source tar ball is found here:
http://www.port389.org/binaries/389-ds-base-1.3.3.11.tar.bz2

The version 1.3.3.11 is still in the test phase, but if you could provide us whether the problem is fixed in the version or is still in it, it will help us. Thank you.

I installed 389-ds-base-1.3.3.11.tar.bz2. With this version the unindexed search with filter '(uid=ab)' works as expected. The problem did not appear.

Replying to [comment:10 gerdx]:

I installed 389-ds-base-1.3.3.11.tar.bz2. With this version the unindexed search with filter '(uid=ab)' works as expected. The problem did not appear.

Thank you so much for your verification. Sorry about the regression in 1.3.3.10.

Since it was verified on 1.3.3.11, let me close this ticket. Thanks!

Metadata Update from @gerdx:
- Issue set to the milestone: N/A

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1520

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata