#47707 389 DS Server crashes and dies while handles paged searches from clients
Closed: Fixed None Opened 5 years ago by rmazgon.

After a correct BIND:
[access log]
...
[19/Feb/2014:08:52:54 +0000] conn=107 op=0 BIND dn="uid=opencms, ou=apps,dc=mydomain,dc=com" method=128 version=3
[19/Feb/2014:08:52:54 +0000] conn=107 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=opencms, ou=apps,dc=mydomain,dc=com"
...

The application makes paged searches that are made, and are returned correctly from the server:
[access log]
...
[19/Feb/2014:09:57:53 +0000] conn=107 op=62 SRCH base="dc=mydomain,dc=com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=user1))" attrs="mail postalAddress description uid sn postalCode givenName"
[19/Feb/2014:09:57:53 +0000] conn=107 op=62 RESULT err=0 tag=101 nentries=1 etime=0 notes=P
...

But if the same BIND connection makes a non paged search:
[access log]
...
[19/Feb/2014:09:57:54 +0000] conn=107 op=133 SRCH base="dc=mydomain,dc=com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=user2))" attrs="mail postalAddress description uid sn postalCode givenName"
[19/Feb/2014:09:57:54 +0000] conn=107 op=133 RESULT err=0 tag=101 nentries=1 etime=0
...

The server dies after the next search operation:
[access log]
...
[19/Feb/2014:09:57:54 +0000] conn=107 op=134 SRCH base="dc=mydomain,dc=com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=user3))" attrs="mail postalAddress description uid sn postalCode givenName"
...

[error log]
....
[19/Feb/2014:09:57:54 +0000] - pagedresults_parse_control_value: invalid cookie: -1

And the slapd instance dies.

I think that the server must be protected on this type of incorrect usage from clients.


What version of 389-ds-base are you using? This might be an issue that has already been fixed.

Please follow the instructions here for generating a stack trace of the crash. That will help us to identify if this is an issue that we have already fixed:

http://port389.org/wiki/FAQ#Debugging_Crashes

It works for me (tested with the master).

[..] conn=8 op=0 BIND dn="uid=tuser0,o=my.com" method=128 version=3
[..] conn=8 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=tuser0,o=my.com"
[..] conn=8 op=1 SRCH base="o=my.com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=tuser1))" attrs="mail postalAddress description uid sn postalCode givenName"
[..] conn=8 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[..] conn=8 op=2 SRCH base="o=my.com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=tuser1))" attrs="mail postalAddress description uid sn postalCode givenName"
[..] conn=8 op=2 RESULT err=0 tag=101 nentries=1 etime=0
[..] conn=8 op=3 SRCH base="o=my.com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=tuser1))" attrs="mail postalAddress description uid sn postalCode givenName"
[..] conn=8 op=3 RESULT err=0 tag=101 nentries=1 etime=0
[..] conn=8 op=5 UNBIND
[..] conn=8 op=5 fd=64 closed - U1

Please note: if a search is a "non paged search" -- with no LDAP_CONTROL_PAGEDRESULTS, it won't call pagedresults_parse_control_value. So, I'm puzzled why you see "pagedresults_parse_control_value: invalid cookie: -1" in the error log...
461 if ( slapi_control_present (ctrlp, LDAP_CONTROL_PAGEDRESULTS,
462 &ctl_value, &iscritical) )
463 {
464 rc = pagedresults_parse_control_value(pb, ctl_value,
465 &pagesize, &pr_idx);

If you could provide your client program/script to reproduce the problem, it may help our investigation. Thanks.

Replying to [comment:1 nkinder]:

What version of 389-ds-base are you using? This might be an issue that has already been fixed.

Please follow the instructions here for generating a stack trace of the crash. That will help us to identify if this is an issue that we have already fixed:

http://port389.org/wiki/FAQ#Debugging_Crashes

This is the startup info:
[20/Feb/2014:16:16:53 +0000] - '''389-Directory/1.2.11.25 B2013.325.1951''' starting up

These are RPMs installed from EPEL:
[root@server ~]# rpm -qa | grep -i 389
'''389-ds-1.2.2-1.el6.noarch
389-ds-console-doc-1.2.6-1.el6.noarch
389-ds-base-libs-1.2.11.25-1.el6.x86_64
389-admin-1.1.35-1.el6.x86_64
389-admin-console-doc-1.1.8-1.el6.noarch
389-admin-console-1.1.8-1.el6.noarch
389-dsgw-1.1.11-1.el6.x86_64
389-adminutil-1.1.19-1.el6.x86_64
389-ds-console-1.2.6-1.el6.noarch
389-console-1.1.7-1.el6.noarch
389-ds-base-1.2.11.25-1.el6.x86_64'''

I have not yet made more test with the application that makes crash the server, when it uses paged searches. I have disabled it. I will make some test with the debug info that you have provided me.

Replying to [comment:2 nhosoi]:

It works for me (tested with the master).

[..] conn=8 op=0 BIND dn="uid=tuser0,o=my.com" method=128 version=3
[..] conn=8 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=tuser0,o=my.com"
[..] conn=8 op=1 SRCH base="o=my.com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=tuser1))" attrs="mail postalAddress description uid sn postalCode givenName"
[..] conn=8 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[..] conn=8 op=2 SRCH base="o=my.com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=tuser1))" attrs="mail postalAddress description uid sn postalCode givenName"
[..] conn=8 op=2 RESULT err=0 tag=101 nentries=1 etime=0
[..] conn=8 op=3 SRCH base="o=my.com" scope=2 filter="(&(objectClass=organizationalPerson)(uid=tuser1))" attrs="mail postalAddress description uid sn postalCode givenName"
[..] conn=8 op=3 RESULT err=0 tag=101 nentries=1 etime=0
[..] conn=8 op=5 UNBIND
[..] conn=8 op=5 fd=64 closed - U1

Please note: if a search is a "non paged search" -- with no LDAP_CONTROL_PAGEDRESULTS, it won't call pagedresults_parse_control_value. So, I'm puzzled why you see "pagedresults_parse_control_value: invalid cookie: -1" in the error log...
461 if ( slapi_control_present (ctrlp, LDAP_CONTROL_PAGEDRESULTS,
462 &ctl_value, &iscritical) )
463 {
464 rc = pagedresults_parse_control_value(pb, ctl_value,
465 &pagesize, &pr_idx);

If you could provide your client program/script to reproduce the problem, it may help our investigation. Thanks.

The LDAP client application that causes the problem, is an OpenCMS module that provides LDAP support (Alkacon OCEE LDAP for OpenCMS v8). It appears that it use com.sun.jndi.ldap beans.

I have not yet made more test with the application that makes crash the server, when it uses paged searches. I have disabled paged searches in the module application, and without them works fine. I will make some test with more debug info.

for reference, opened Red Hat Bugzilla for very similar report:
bz 1071707 - rhds91 389-ds-base-1.2.11.15-31.el6_5 crash on paged searches followed by simple srch

I'm trying to reproduce the problem, but so far no luck. Using the same connection (conn=8), I ran one operation out of the rest with no SIMPLE PAGED RESULTS control (op=2). But the following "next" simple paged result requests were processed without causing the reported crash. I did the non SIMPLE PAGED RESULT search as op=1 or later, but there was no difference. If you could provide us a stacktrace from the crash and/or a reproducer, it'd be a big help...
{{{
[..] conn=8 op=0 BIND dn="uid=TVradmin0,ou=People,o=my.com" method=128 version=3
[..] conn=8 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=tvradmin0,ou=people,o=my.com"
[..] conn=8 op=1 SRCH base="o=my.com" scope=2 filter="(uid=)" attrs="uid cn mail"
[..] conn=8 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U,P
[..] conn=8 op=2 SRCH base="o=my.com" scope=2 filter="(cn=
)" attrs="uid cn mail"
[..] conn=8 op=2 RESULT err=0 tag=101 nentries=100 etime=0
[..] conn=8 op=3 SRCH base="o=my.com" scope=2 filter="(objectClass=)" attrs="uid cn mail"
[..] conn=8 op=3 RESULT err=0 tag=101 nentries=1 etime=0 notes=U,P
[..] conn=8 op=4 SRCH base="o=my.com" scope=2 filter="(uid=
)" attrs="uid cn mail"
[..] conn=8 op=4 RESULT err=0 tag=101 nentries=1 etime=0 notes=U,P
[..] conn=8 op=5 SRCH base="o=my.com" scope=2 filter="(cn=*)" attrs="uid cn mail"
[..] conn=8 op=5 RESULT err=0 tag=101 nentries=1 etime=0 notes=P
[..]
}}}

I think that the problem is caused when the same connection makes concurrent requests, before the paged searches have finished. This afternoon we will take a time to apply the stacktrace config in the laboratory host (http://port389.org/wiki/FAQ#Debugging_Crashes), and we will force searches from the app client. When crashes, we will upload the stacktrace file.

Thanks a lot.

Replying to [comment:8 nhosoi]:

I'm trying to reproduce the problem, but so far no luck. Using the same connection (conn=8), I ran one operation out of the rest with no SIMPLE PAGED RESULTS control (op=2). But the following "next" simple paged result requests were processed without causing the reported crash. I did the non SIMPLE PAGED RESULT search as op=1 or later, but there was no difference. If you could provide us a stacktrace from the crash and/or a reproducer, it'd be a big help...
{{{
[..] conn=8 op=0 BIND dn="uid=TVradmin0,ou=People,o=my.com" method=128 version=3
[..] conn=8 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=tvradmin0,ou=people,o=my.com"
[..] conn=8 op=1 SRCH base="o=my.com" scope=2 filter="(uid=)" attrs="uid cn mail"
[..] conn=8 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U,P
[..] conn=8 op=2 SRCH base="o=my.com" scope=2 filter="(cn=
)" attrs="uid cn mail"
[..] conn=8 op=2 RESULT err=0 tag=101 nentries=100 etime=0
[..] conn=8 op=3 SRCH base="o=my.com" scope=2 filter="(objectClass=)" attrs="uid cn mail"
[..] conn=8 op=3 RESULT err=0 tag=101 nentries=1 etime=0 notes=U,P
[..] conn=8 op=4 SRCH base="o=my.com" scope=2 filter="(uid=
)" attrs="uid cn mail"
[..] conn=8 op=4 RESULT err=0 tag=101 nentries=1 etime=0 notes=U,P
[..] conn=8 op=5 SRCH base="o=my.com" scope=2 filter="(cn=*)" attrs="uid cn mail"
[..] conn=8 op=5 RESULT err=0 tag=101 nentries=1 etime=0 notes=P
[..]
}}}

Hello!

We have made a core dump succesfully, after a server crash:
[root@server ~]# cat /var/log/ldap/ldap-access | grep conn=674
[07/Mar/2014:12:26:49 +0000] conn=674 fd=69 slot=69 connection from X.X.X.X to Y.Y.Y.Y
[07/Mar/2014:12:26:49 +0000] conn=674 op=0 BIND dn="uid=opencms-ldap,o=Applications,o=gobiernodecanarias,c=es" method=128 version=3
[07/Mar/2014:12:26:49 +0000] conn=674 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=opencms-ldap,c=es"
[07/Mar/2014:12:26:49 +0000] conn=674 op=1 SRCH base="c=es" scope=2 filter="(&(objectClass=organizationalPerson)(uid=emoruser))" attrs="mail postalAddress description uid sn postalCode givenName"
[07/Mar/2014:12:26:49 +0000] conn=674 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=P
[07/Mar/2014:12:26:49 +0000] conn=674 op=2 ABANDON targetop=Simple Paged Results
[07/Mar/2014:12:26:56 +0000] conn=674 op=3 SRCH base="c=es" scope=2 filter="(&(objectClass=organizationalPerson)(uid=admin))" attrs="mail street telephoneNumber postalAddress description uid sn postalCode givenName l"
[07/Mar/2014:12:26:56 +0000] conn=674 op=3 RESULT err=0 tag=101 nentries=3 etime=0
'''[07/Mar/2014:12:26:56 +0000] conn=674 op=4 SRCH base="c=es" scope=2 filter="(&(objectClass=organizationalPerson)(uid=lbaruser))" attrs="mail postalAddress description uid sn postalCode givenName"
'''

After the crash, we have parsed the dump over gdb. I attach the file generated (access.buf).

Replying to [comment:10 rmazgon]: > After the crash, we have parsed the dump over gdb. I attach the file generated (access.buf). 269 Program terminated with signal 6, Aborted. 270 #0 0x0000003b066328e5 in raise () from /lib64/libc.so.6 271 $1 = 0x7ffb4b7fc010 "[07/Mar/2014:13:03:27 +0000] conn=2 op=-1 fd=65 closed - B1\n7 nentries=0 etime=0\ns,c=es\" method=128 version=3\n(uid=userX))\" attrs=\"mail postalAddress description uid sn postalCode givenName\"\n" After attaching ns-slapd to the core, could you do "thread apply all bt"? Something like this...? (gdb) thread apply all bt In the access log, I see this "ABANDON". What operation made this abandon? It might be missing in my attempt to reproduce the problem... > [07/Mar/2014:12:26:49 +0000] conn=674 op=2 ABANDON targetop=Simple Paged Results Thanks!!

gdb -ex 'set print elements 0' -ex 'set confirm off' -ex 'set pagination off' -ex 'print loginfo.log_access_buffer.top' -ex 'thread apply all bt' -ex 'quit' /usr/sbin/ns-slapd /var/log/ldap/core.8789 > access4.buf 2>&1
access4.buf

Hello,

I have attached a stack trace of the crash including "thread apply all bt".

Thanks,

Jamie

I might be facing something similar, our host dies daily basis.
The server is a read-only slave with little bit older RPMs running on RedHat 6.5

389-ds-base-1.2.11.15-32.el6_5.x86_64
389-ds-base-debuginfo-1.2.11.15-32.el6_5.x86_64

I included [https://fedorahosted.org/389/raw-attachment/ticket/47707/stacktrace.1395445012.txt stacktrace] from our host.

Steps to reproduce it:

  • import ldif of 15000 users with entries of this sort:

dn: cn=userXXXX,o=redhat
objectclass: inetorgperson
cn: userXXXX
sn: userXXXX
description: descXXXX
userpassword: userXXXX

(in my test case, I have replication of o=redhat configured. Could be irrelevant. But we have to remember it's a timing issue).

  • change the following lines of paged_def.c
    line6: host and port of your server
    line28: user root and user root password

NOTE: in my test case sn attribute is not indexed.

  • build it by
    cc paged_def.c -o testpagedsearch -lldap

we should see core dump with the signature showed in core_signature.txt

diffs I have used to fix the core dump.
diffs_pagedresults.txt

Hi German,

Your fix looks good to me. May I ask you to run one more test with your patch?

Keep the paged result requests search connected until it reaches the search timelimit? I'd like to see the connection gets disconnected and how the paged result handles are cleaned up then. If no double free is observed, I think #47706 is fixed as well...

Also, when adding a patch, it'll be nice if you could make it via git. It could be applied to other developers' local tree easier. Also, the patch contains your name and comments. Please take a look at this page. If you have any questions, please let us know.
http://directory.fedoraproject.org/wiki/GIT_Rules

Hi German,

I'm trying to duplicate the crash using your test client. I tested it against the server built from the master and 389-ds-base-1.2.11 branch (without applying your patch) on Fedora 19, but it looks the servers work just fine.

Following your instructions, I set up 2-way MMR, and imported 15000 user entries, then started test program "testpagedsearch". It's been running for more than 30 min...

How long did it take for you to crash the server? Which 389-ds-base branch and OS did you use for your debugging? Any advice to reproduce the crash would be greatly appreciated...

Hi Noriko, Thanks for your comment. I have used 389-ds-base-1.2.11.15-31.el6_5.x86_64 version, same than one of the customers having the crash (the other one is using 389-ds-base-1.2.11.15-30.el6_5.x86_64). I have not tried in fedora neither the master branch yet. I was going to do this today. Server is crashing with paged_def.c some seconds afterwards. The VM where I reproduce it is 10.34.57.163. I will send the credentials by email. Running it now: -------------------------------------------------- [root@vm-163 ~]# ./pageit paged search of 100 entries Before ldap search Ldap search return is 0 msgidp = 2 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 5 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 8 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 11 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 14 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 17 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 20 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 23 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 26 Abandonned successfull Before ldap search Ldap search return is 0 msgidp = 29 Abandonned successfull Before ldap search as. searched failed with ldap error (-1) (Can't contact LDAP server) [root@vm-163 ~]# service dirsrv status dirsrv server1 dead but pid file exists dirsrv server2 (pid 27266) is running... [root@vm-163 ~]# ls -l /var/log/dirsrv/slapd-server1/core.27190 -rw------- 1 nobody nobody 500621312 Apr 23 07:48 /var/log/dirsrv/slapd-server1/core.27190 [root@vm-163 ~]# gdb /usr/sbin/ns-slapd /var/log/dirsrv/slapd-server1/core.27190 ... (gdb) bt #0 0x00007f043b152925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f043b154105 in abort () at abort.c:92 #2 0x00007f043b190837 in __libc_message (do_abort=2, fmt=0x7f043b278ac0 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x00007f043b196166 in malloc_printerr (action=3, str=0x7f043b276c15 "realloc(): invalid old size", ptr=<value optimized="" out="">) at malloc.c:6332 #4 0x00007f043b19bc17 in _int_realloc (av=0x7f043b4afe80, oldp=0x7f03f4002f30, oldsize=<value optimized="" out="">, nb=912) at malloc.c:5280 #5 0x00007f043b19bdd5 in __libc_realloc (oldmem=0x7f03f4002f40, bytes=896) at malloc.c:3826 #6 0x00007f043d6a3d3c in slapi_ch_realloc (block=0x7f03f4002f40 " \274x\002", size=896) at ldap/servers/slapd/ch_malloc.c:199 #7 0x00007f043d6e531a in pagedresults_parse_control_value ( pb=<value optimized="" out="">, psbvp=<value optimized="" out="">, pagesize=<value optimized="" out="">, index=0x7f04095eb37c) at ldap/servers/slapd/pagedresults.c:116 #8 0x00007f043d6e274b in op_shared_search (pb=0x2c05160, send_result=1) at ldap/servers/slapd/opshared.c:464 #9 0x0000000000426591 in do_search (pb=0x2c05160) ---Type <return> to continue, or q <return> to quit--- at ldap/servers/slapd/search.c:355 #10 0x000000000041431a in connection_dispatch_operation () at ldap/servers/slapd/connection.c:622 #11 connection_threadmain () at ldap/servers/slapd/connection.c:2339 #12 0x00007f043bb189a6 in _pt_root (arg=0x2bf4d50) at ../../../nspr/pr/src/pthreads/ptthread.c:204 #13 0x00007f043b4bb9d1 in start_thread (arg=0x7f04095ee700) at pthread_create.c:301 #14 0x00007f043b208b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) -------------------------------------------------- Ron has tested with a windows client (I have no windows machine to do it right now). He can reproduce the crash (by refreshing during long search to have it abandoned). He applies the patch and then, crash is gone but we can one search which finishing by ldap error 2 (protocol error). So, I think there's still an issue there. Also I can see in this extracts of access logs: ---------------------------------------------------------- [22/Apr/2014:13:52:08 +0200] conn=1 op=5 SRCH base="ou=People,dc=rh,dc=local" scope=0 filter="(objectClass=*)" attrs=ALL [22/Apr/2014:13:52:08 +0200] conn=1 op=5 RESULT err=0 tag=101 nentries=1 etime=0 [22/Apr/2014:13:52:08 +0200] conn=1 op=6 SRCH base="ou=People,dc=rh,dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass subschemaSubentry" [22/Apr/2014:13:52:08 +0200] conn=1 op=6 RESULT err=0 tag=101 nentries=1000 etime=0 notes=U,P [22/Apr/2014:13:52:08 +0200] conn=1 op=7 ABANDON targetop=Simple Paged Results [22/Apr/2014:13:52:08 +0200] conn=1 op=8 SRCH base="ou=People,dc=rh,dc=local" scope=0 filter="(objectClass=*)" attrs=ALL [22/Apr/2014:13:52:08 +0200] conn=1 op=8 RESULT err=0 tag=101 nentries=1 etime=0 [22/Apr/2014:13:52:08 +0200] conn=1 op=9 SRCH base="ou=People,dc=rh,dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass subschemaSubentry" [22/Apr/2014:13:52:08 +0200] conn=1 op=9 RESULT err=0 tag=101 nentries=0 etime=0 notes=P [22/Apr/2014:13:52:11 +0200] conn=1 op=10 SRCH base="ou=People,dc=rh,dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass subschemaSubentry" [22/Apr/2014:13:52:11 +0200] conn=1 op=10 RESULT err=2 tag=101 nentries=0 etime=0 ---------------------------------------------------------- All search operations have finished. I was waiting to see ABANDON=targetop not found. I will try to reproduce on the master branch and I will immediately send my findings to you. Thanks and regards.

I cannot reproduce it in F19 with paged_def.c

389-ds-base-1.3.1.22-1.fc19.x86_64

I have found the reason why the crash is not reproduced in the master branch.

In fact, https://fedorahosted.org/389/ticket/47623 is protecting (in a way) the code that is provoking the crash. A check has been added:

  • if (!conn->c_pagedresults.prl_list[*index].pr_mutex) {
  • conn->c_pagedresults.prl_list[*index].pr_mutex = PR_NewLock();
  • }

So, more likely the assignment is not taking place. The crash is taking place only when *index == -1 (this happens when operation is abandoned)

The patch proposed was protecting this code a little more since *index could be -1 and also it is checking whether operation has been abandoned in some other part of the control decoding. But the crash was provoked only by the wrong assignment, imho.

I would add a check that *index != 1 before the condition, though.

Regards,

German

Hi,

I want to share some additional investigation from my side.

I have used master branch and my reproducer. I cannot crash the server BUT I see from time to time the message

[29/Apr/2014:12:03:49 +0200] - pagedresults_parse_control_value: invalid cookie: -1

So, this part in the code:

if (!conn->c_pagedresults.prl_list[index].pr_mutex) {
conn->c_pagedresults.prl_list[
index].pr_mutex = PR_NewLock();
}

is being called with *index==-1 indeed.

Now, I remove latest changes of that source file (so as to come back to customer version):

git checkout d53e822 ldap/servers/slapd/pagedresults.c
git commit -m"reverting revs" ldap/servers/slapd/pagedresults.c

Re-build it and it crashes. Not very often but I arrive to reproduce the crash, just after having the message:

[29/Apr/2014:12:25:42 +0200] - pagedresults_parse_control_value: invalid cookie: -1

To sum up:

  • git checkout to come back to customer version of the file.

crash once the message "pagedresults_parse_control_value: invalid cookie: -1"

  • master:

Wait just to reproduce the message "pagedresults_parse_control_value: invalid cookie: -1" and see it's not crashing.

Some more hints to reproduce:

  • use the reproducer paged_def.c + change the ldap search to a quicker one as, for instance, (sn=user100) instead of (sn-user) in line 51.

  • don't let the reproducer to run for long. The crash is nearly immediate BUT you must restart dirsrv each time. Why ? I think, I have not fully verified, prl_list is not cleaned up after operation has been abandoned. So, count is increasing always. I am talking here of the file being reverted to rev d53e822.


Crash is not taking place but something seems still weird. Request for a new page (of an existing search) but prl_list is already cleaned up ? And that's why *index == -1.

Bug Description: If a simple paged search request was sent to the server
and the request was abandoned, the paged result slot in the connection
table was not properly released by setting NULL to pr_current_be. Since
the slot did not look available for the next request even though it was,
the next request failed to get the valid slot number, and the initial slot
number -1 failed to be replaced with the real slot number. Until the fix
for "Ticket #47623 fix memleak caused by 47347" was made, it overrode the
allocated array's [-1] location, which usually stores the meta data of the
allocated memory. That crashed the server in the next realloc since the
corrupted memory was passed to the function.

Fix Description: This patch cleans up the abandoned/cleaned up slot for
reuse. Also, more check not to break the meta data is added.

Special thanks to German Parente (gparente@redhat.com) for providing the
reproducer and analysing the crash.

Reviewed by Rich (Thank you!!)

Pushed to master:
d341b77..087356f master -> master
commit 087356f

Pushed to 389-ds-base-1.3.2:
65c5e58..2132875 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 2132875

Pushed to 389-ds-base-1.3.1:
ea1b127..40e86e7 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 40e86e74fb4ecc0fc5a1027d8241945d9b2564e0

Pushed to 389-ds-base-1.2:11:
00a7594..b2ee65d 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit b2ee65d

Metadata Update from @gparente:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.11.30

2 years ago

Login to comment on this ticket.

Metadata