#49429 bug in 389-ds-base-1.2.11.15-91.el6_9.x86_64 with large paged searches
Closed: wontfix 5 years ago Opened 6 years ago by jeaton.

Issue Description

I recently updated to 389-ds-base-1.2.11.15-91.el6_9.x86_64 (from 389-ds-base-1.2.11.15-50.el6_6.x86_64) and have uncovered a bug with large paged searches blocking all other searches on the server.

The case I've discovered is as follows:

I have a search which returns about 28,000 entries. The system which generates this search does a paged search with 1000 results per page. I can replicate the behavior with a command line ldap search like the following

ldapsearch -H ldaps://servername:636 -D 'uid=binddn,ou=specials,dc=cmu,dc=edu' -x -W -E pr=1000/noprompt "(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))",

As soon as the first 1000 entries are returned, and the search requests the second page, all other searches being done to the same server block.

Package Version and Platform

Red Hat Enterprise Linux Server release 6.9 (Santiago)
389-ds-base-1.2.11.15-91.el6_9.x86_64

Steps to reproduce

  1. Start a loop with a trivial query, eg: while true; do time ldapsearch -h servername -x uid=jeaton; sleep 1 ; done

  2. Run a large paged query, eg: ldapsearch -H ldaps://servername:636 -D 'uid=binddn,ou=specials,dc=cmu,dc=edu' -x -W -E pr=1000/noprompt "(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))"

(this query returns a total of 28K entries, in pages of 1000

  1. Note that as soon as the second page is requested, the server blocks entirely for the duration of the request:

[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:48 -0400] conn=202657 op=2 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:48 -0400] conn=202699 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:11:33 -0400] conn=202657 op=2 RESULT err=0 tag=101 nentries=1000 etime=45 notes=P pr_idx=0
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 RESULT err=0 tag=101 nentries=1 etime=0

Actual results

The server performs fine during the request for the first page of results. As soon as the second page is requested, all other requests to the server block until the response for the page is sent. The server continues to block for each additional page of responses, until the entire long search is completed.

Expected results

Other queries continue to get responses.

Full log excerpts below:

Everything looks good in the access log at first:

[26/Oct/2017:10:10:02 -0400] conn=202655 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:02 -0400] conn=202655 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:02 -0400] conn=202655 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:02 -0400] conn=202655 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:02 -0400] conn=202655 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:02 -0400] conn=202655 op=2 UNBIND
[26/Oct/2017:10:10:02 -0400] conn=202655 op=2 fd=85 closed - U1
[26/Oct/2017:10:10:03 -0400] conn=202656 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:04 -0400] conn=202656 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:04 -0400] conn=202656 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:04 -0400] conn=202656 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:04 -0400] conn=202656 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:04 -0400] conn=202656 op=2 UNBIND
[26/Oct/2017:10:10:04 -0400] conn=202656 op=2 fd=85 closed - U1
[26/Oct/2017:10:10:04 -0400] conn=202657 fd=85 slot=85 SSL connection from 10.0.0.2 to 10.1.1.1
[26/Oct/2017:10:10:04 -0400] conn=202657 TLS1.0 256-bit AES
[26/Oct/2017:10:10:04 -0400] conn=202657 op=0 BIND dn="uid=binddn,ou=specials,dc=cmu,dc=edu" method=128 version=3
[26/Oct/2017:10:10:04 -0400] conn=202657 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=binddn,ou=specials,dc=cmu,dc=edu"
[26/Oct/2017:10:10:04 -0400] conn=202657 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:05 -0400] conn=202658 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:05 -0400] conn=202658 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:05 -0400] conn=202658 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:05 -0400] conn=202658 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:05 -0400] conn=202658 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:05 -0400] conn=202658 op=2 UNBIND
[26/Oct/2017:10:10:05 -0400] conn=202658 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:06 -0400] conn=202659 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:06 -0400] conn=202659 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:06 -0400] conn=202659 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:06 -0400] conn=202659 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:06 -0400] conn=202659 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:06 -0400] conn=202659 op=2 UNBIND
[26/Oct/2017:10:10:06 -0400] conn=202659 op=2 fd=86 closed - U1

until we get to the point where the first page is returned and the client requests the second page of results:

[26/Oct/2017:10:10:46 -0400] conn=202697 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:46 -0400] conn=202697 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:46 -0400] conn=202697 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:46 -0400] conn=202697 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:46 -0400] conn=202697 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:46 -0400] conn=202697 op=2 UNBIND
[26/Oct/2017:10:10:46 -0400] conn=202697 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:48 -0400] conn=202657 op=1 RESULT err=0 tag=101 nentries=1000 etime=44 notes=P pr_idx=0
[26/Oct/2017:10:10:48 -0400] conn=202657 op=2 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:48 -0400] conn=202699 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:11:33 -0400] conn=202657 op=2 RESULT err=0 tag=101 nentries=1000 etime=45 notes=P pr_idx=0
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:11:33 -0400] conn=202657 op=3 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=
)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=2 UNBIND
[26/Oct/2017:10:11:33 -0400] conn=202699 op=2 fd=86 closed - U1
[26/Oct/2017:10:12:14 -0400] conn=202657 op=3 RESULT err=0 tag=101 nentries=1000 etime=41 notes=P pr_idx=0
[26/Oct/2017:10:12:14 -0400] conn=202700 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:12:14 -0400] conn=202700 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:12:14 -0400] conn=202657 op=4 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:12:14 -0400] conn=202700 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:12:14 -0400] conn=202700 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:12:14 -0400] conn=202700 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:12:14 -0400] conn=202700 op=2 UNBIND
[26/Oct/2017:10:12:14 -0400] conn=202700 op=2 fd=86 closed - U1
[26/Oct/2017:10:12:58 -0400] conn=202657 op=4 RESULT err=0 tag=101 nentries=1000 etime=44 notes=P pr_idx=0

Note the large gap in time between 10:10:48 and 10:11:33. You can see my client doing the trivial search connect at 10:10:48 with conn=202699. At that point the client just hangs, waiting on the server's response. Only after the server responds with the second page of results at 10:11:33, is my client able to send the bind request and search. This same pattern happens with every further page of searches, eg, 10:11:33 - 10:12:14.

Then when we get to the final page, as soon as that completes, the trivial searches once again complete at one per second:

[26/Oct/2017:10:25:30 -0400] conn=202657 op=28 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:25:30 -0400] conn=202737 op=0 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:25:30 -0400] conn=202738 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:25:30 -0400] conn=202738 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:25:30 -0400] conn=202738 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:25:58 -0400] conn=202738 op=2 UNBIND
[26/Oct/2017:10:25:58 -0400] conn=202740 fd=89 slot=89 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:25:58 -0400] conn=202657 op=28 RESULT err=0 tag=101 nentries=1000 etime=28 notes=P pr_idx=0
[26/Oct/2017:10:25:58 -0400] conn=202737 op=1 UNBIND
[26/Oct/2017:10:25:58 -0400] conn=202737 op=1 fd=87 closed - U1
[26/Oct/2017:10:25:58 -0400] conn=202738 op=2 fd=88 closed - U1
[26/Oct/2017:10:25:58 -0400] conn=202740 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:25:58 -0400] conn=202657 op=29 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=
)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:25:58 -0400] conn=202740 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:25:58 -0400] conn=202740 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:25:58 -0400] conn=202740 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:08 -0400] conn=202741 fd=87 slot=87 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:08 -0400] conn=202740 op=2 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202741 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:08 -0400] conn=202741 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:08 -0400] conn=202740 op=2 fd=89 closed - U1
[26/Oct/2017:10:26:08 -0400] conn=202741 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:08 -0400] conn=202657 op=29 RESULT err=0 tag=101 nentries=455 etime=10 notes=P pr_idx=0
[26/Oct/2017:10:26:08 -0400] conn=202741 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:08 -0400] conn=202657 op=30 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202741 op=2 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202657 op=30 fd=85 closed - U1
[26/Oct/2017:10:26:08 -0400] conn=202741 op=2 fd=87 closed - U1
[26/Oct/2017:10:26:09 -0400] conn=202742 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:09 -0400] conn=202742 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:09 -0400] conn=202742 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:09 -0400] conn=202742 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:09 -0400] conn=202742 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:09 -0400] conn=202742 op=2 UNBIND
[26/Oct/2017:10:26:09 -0400] conn=202742 op=2 fd=85 closed - U1
[26/Oct/2017:10:26:10 -0400] conn=202743 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:10 -0400] conn=202743 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:10 -0400] conn=202743 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:10 -0400] conn=202743 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:10 -0400] conn=202743 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:10 -0400] conn=202743 op=2 UNBIND

I did some testing and think that the bug was introduced around 389-ds-base-1.2.11.15-72.el6_7. I can probably upgrade one by one and find it specifically if desired.


Tickets of interest

  • 1.2.11.15-75 --- Paged results search returns the blank list of entries (DS 48808)
  • 1.2.11.15-71 --- SimplePagedResults -- in the search error case, simple paged results slot was not released (DS 48375)
  • 1.2.11.15-69 --- SimplePagedResults -- abandon could happen between the abandon check and sending results -- Fixing a regression introduced in 1.2.11.15-68 (DS 48338)
  • 1.2.11.15-68 --- SimplePagedResults -- abandon could happen between the abandon check and sending results (DS 48338)
  • 1.2.11.15-64 --- pagedresults - when timed out, search results could have been already freed. (DS 48299)

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

Connection 2 is the paged result search, connection 1 is just performing search and search.

[30/Oct/2017:13:34:14.969198401 -0400] conn=1 op=257763 SRCH base="dc=example,dc=com" scope=2 filter="(uid=test_entry)" attrs=ALL
[30/Oct/2017:13:34:14.969221738 -0400] conn=2 op=2 RESULT err=0 tag=101 nentries=200000 etime=20 notes=U,P pr_idx=0 pr_cookie=0
[30/Oct/2017:13:34:14.969256588 -0400] conn=1 op=257763 RESULT err=0 tag=101 nentries=1 etime=0
[30/Oct/2017:13:34:14.969341755 -0400] conn=1 op=257764 SRCH base="dc=example,dc=com" scope=2 filter="(uid=test_entry)" attrs=ALL
[30/Oct/2017:13:34:14.969400616 -0400] conn=1 op=257764 RESULT err=0 tag=101 nentries=1 etime=0
[30/Oct/2017:13:34:27.644432708 -0400] conn=2 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(uid=*)" attrs=ALL
[30/Oct/2017:13:34:27.644431842 -0400] conn=1 op=257766 SRCH base="dc=example,dc=com" scope=2 filter="(uid=test_entry)" attrs=ALL
[30/Oct/2017:13:34:27.644625743 -0400] conn=1 op=257766 RESULT err=0 tag=101 nentries=1 etime=0

So there is a blocking event between the page being returned, and the next page being prepared.

I had to use a large database and very large page size (20000) to get this to occur, but I would like test this over the network to see if I can get the same results (etime=20) but with a smaller page size.

Its likely that the particular search in question in my example triggers it more easily due to the large number of objects with isMemberOf attributes (~80K) as well as the high number of values for that attribute (some objects have upwards of 2000 values).

I did try to recreate this on RH7 with 389-ds-base-1.3.6.1-21.el7_4.x86_64 and was not able to reproduce it there yet.

Actually I did not reproduce the issue, the problem was that my python test script was hanging, not the server.

I may be able to generate a sufficiently anonymized database sample to replicate the problem if you would like.

@jeaton any update on a reproducer? Thanks!

Metadata Update from @mreynolds:
- Issue close_status updated to: worksforme
- Issue status updated to: Closed (was: Open)

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2488

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: worksforme)

3 years ago

Login to comment on this ticket.

Metadata