#47743 Memory leak with proxy auth control
Closed: Fixed None Opened 5 years ago by chatfield.

I've been running some load tests against version 389-ds-base-1.2.11.15-31.el6_5.x86_64 on Centos 6.5. When looking at the vmsize of the ns-slapd process it appears to grow, rather than reaching a plateau. After some modifications to the load test it appears that the memory usage is flat if we remove the portion that performs the proxied authentication control.

I've created a simplified version of the data that exhibits the problem. This backup ldif is attached to the case. It consists of 3 users and 2 acis.

The load is the following pseudocode:
create a simple connection, binding as uid=AUser,o=Test.com
In 30 threads, repeat 1000 times each (i.e. 30000 total searches):
using this single connection, proxy as uid=PUser,o=Test.com, and search for uid=SUser

In the end, the access log would look something like:
[13/Mar/2014:16:32:57 +0000] conn=5 op=29999 SRCH base="o=test.com" scope=2 filter="(uid=SUser)" attrs=ALL authzid="uid=puser,o=test.com"
[13/Mar/2014:16:32:57 +0000] conn=5 op=29999 RESULT err=0 tag=101 nentries=1 etime=0.001000
[13/Mar/2014:16:32:57 +0000] conn=5 op=30000 SRCH base="o=test.com" scope=2 filter="(uid=SUser)" attrs=ALL authzid="uid=puser,o=test.com"
[13/Mar/2014:16:32:57 +0000] conn=5 op=30000 RESULT err=0 tag=101 nentries=1 etime=0.001000
[13/Mar/2014:16:34:04 +0000] conn=5 op=30002 UNBIND
[13/Mar/2014:16:34:04 +0000] conn=5 op=30002 fd=66 closed - U1

You can see the memory grow with this command:
for((i=0;;++i)) { echo date grep VmSize /proc/\pidof ns-slapd`/status | grep -o '[0-9]*'`; sleep 1 || break; }

Note that when the connection/search finishes, the memory is not freed. If you run the test again it will grow again from the new baseline. The original (more complex and slow) load test caused the ns-slapd process to be aborted due to out-of-memory after around a week.


Chris,

I'm not seeing any linear growth. How long are you running your test for?

Over several minutes of running 30 clients issuing proxy auth ldapsearches I see this:

DATE - [TIME] ------- PID VSZ RSS PCPU
03/17-[10:07:23] ------- 10489 2685604 16904 43.5 growth: 0 k
03/17-[10:07:34] ------- 10489 2685604 16904 44.2 growth: 0 k
03/17-[10:07:44] ------- 10489 2685604 16904 44.6 growth: 0 k
03/17-[10:07:55] ------- 10489 2685604 16904 45.0 growth: 0 k
03/17-[10:08:05] ------- 10489 2685604 16904 45.7 growth: 0 k
03/17-[10:08:15] ------- 10489 2685604 16904 44.6 growth: 0 k
03/17-[10:08:26] ------- 10489 2685604 16904 43.5 growth: 0 k
03/17-[10:08:36] ------- 10489 2685604 16916 43.3 growth: 0 k
03/17-[10:08:46] ------- 10489 2685604 16916 42.3 growth: 0 k
03/17-[10:08:56] ------- 10489 2685604 16916 41.3 growth: 0 k
03/17-[10:09:06] ------- 10489 2685604 16916 40.4 growth: 0 k
03/17-[10:09:16] ------- 10489 2685604 16916 39.5 growth: 0 k
03/17-[10:09:26] ------- 10489 2685604 16916 38.7 growth: 0 k
03/17-[10:09:36] ------- 10489 2685604 16916 37.9 growth: 0 k
03/17-[10:09:46] ------- 10489 2685604 16928 37.8 growth: 0 k
03/17-[10:09:56] ------- 10489 2685604 16944 38.1 growth: 0 k
03/17-[10:10:06] ------- 10489 2685604 16952 38.3 growth: 0 k
03/17-[10:10:16] ------- 10489 2685604 16956 38.5 growth: 0 k
03/17-[10:10:26] ------- 10489 2685604 16956 38.7 growth: 0 k
03/17-[10:10:36] ------- 10489 2685604 16960 38.9 growth: 0 k
03/17-[10:10:46] ------- 10489 2685604 16972 39.0 growth: 0 k
03/17-[10:10:56] ------- 10489 2685604 16976 38.7 growth: 0 k
03/17-[10:11:06] ------- 10489 2685604 16988 38.7 growth: 0 k
03/17-[10:11:16] ------- 10489 2685604 16988 38.7 growth: 0 k
03/17-[10:11:26] ------- 10489 2685604 16988 38.9 growth: 0 k
03/17-[10:11:36] ------- 10489 2685604 16992 38.9 growth: 0 k
03/17-[10:11:46] ------- 10489 2685604 16992 39.0 growth: 0 k
03/17-[10:11:56] ------- 10489 2685604 16996 38.9 growth: 0 k
03/17-[10:12:06] ------- 10489 2685604 17000 39.0 growth: 0 k
03/17-[10:12:16] ------- 10489 2685604 17000 38.8 growth: 0 k
03/17-[10:12:26] ------- 10489 2685604 17000 38.7 growth: 0 k
03/17-[10:12:36] ------- 10489 2685604 17000 38.5 growth: 0 k
03/17-[10:12:46] ------- 10489 2685604 17004 38.3 growth: 0 k
03/17-[10:12:56] ------- 10489 2685604 17004 38.2 growth: 0 k
03/17-[10:13:06] ------- 10489 2685604 17004 38.3 growth: 0 k

Every now and then the virtual size does increase, but it looks like memory fragmentation, and not linear growth.

These are the searches that are being performed:

ldapsearch -D "uid=auser,o=test.com" -w password -b "o=test.com" -LLL -e !authzid="dn:uid=puser,o=test.com" uid=suser

This is with a standard install of 389. No extra configuration has been performed, except for creating the backend, and adding those 3 entries.

Is there anything I am missing that you might have done differently on your system?

Access and memory log during load test
logs.tar.gz

Hi,

I've just uploaded some logs from 2 runs of our test. On the first run (after a recent restart) the VmSize went from 2693552K to 3284192K. On the second run shortly after it increased again to 3489336K. The first run's growth is done in a more "chunky" way, whereas the second run is a little smoother. I cannot show it growing linearly (as I said in the mailing list message).

I guess the main difference between my test and yours is that each test run is sharing a single connection. I don't know whether the threading is important - I found it was the fastest way to get a visible result. The 2 runs above are each performing roughly 25000 searches in 3 minutes (shared between the threads).

I will shortly retest on a pristine installation to ensure that config changes I made are not affecting things (e.g. logs, cache size etc).

My test just finished, it ran for over 30 minutes(300K searches), and there was no growth at all:

03/17-[10:07:23] ------- 10489 2685604 16904 43.5 growth: 0 k
...
03/17-[10:40:42] ------- 10489 2685604 17180 21.8 growth: 0 k

I'll be looking into your logs next...

Any chance you can attach your test client for me to test with?

What other features of 389 are you using? Are you using multi-master replication? memberOf? DNA? Perhaps the memory growth happens in conjunction with one of these other features and proxy auth?

I will see if I can rewrite the test in a way that can be made public - currently it is using our wrappers and helper functions. If it makes any difference, this test client is built on Windows using the Mozilla C LDAP SDK (6.07, built from source). I could upload a Wireshark/pcap trace if that helps.

I created a fresh install on Centos 6.5 64 bit on a VM, and on this installed the latest 389DS from EPEL. I created a new server using setup-ds-admin.pl, and imported the backup ldif from the ticket. Running the test still shows the growth. i.e. the simplest/'defaultest' installation I could create is still problematic.

If you look at the access logs it should show that these are fairly standard searches. Hopefully it would be possible to find another way for you to see the problem in your environment. I'll look for a way for you to recreate it.

Ok, I've recreated it in Java, without requiring threads. Here is the code - note it's hardwired to my VM (172.16.3.85:489):

{{{

import com.sun.jndi.ldap.ctl.ProxiedAuthorizationControl;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Hashtable;

import javax.naming.Context;
import javax.naming.NamingEnumeration;
import javax.naming.NamingException;
import javax.naming.directory.SearchControls;
import javax.naming.ldap.Control;
import javax.naming.ldap.InitialLdapContext;
import javax.naming.ldap.LdapContext;

public class LoadTest
{
private SearchControls getSimpleSearchControls()
{
final SearchControls searchControls = new SearchControls();
searchControls.setSearchScope(SearchControls.SUBTREE_SCOPE);
searchControls.setTimeLimit(30000);
return searchControls;
}

public LdapContext getLdapContext() throws NamingException
{
LdapContext ctx = null;
final Hashtable env = new Hashtable();
env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.ldap.LdapCtxFactory");
env.put(Context.SECURITY_AUTHENTICATION, "Simple");
env.put(Context.SECURITY_PRINCIPAL, "uid=AUser,o=Test.com");
env.put(Context.SECURITY_CREDENTIALS, "password");
env.put(Context.PROVIDER_URL, "ldap://172.16.3.85:489/o=test.com");
ctx = new InitialLdapContext(env, null);
return ctx;
}

public void search(final LdapContext ctx) throws NamingException
{
final NamingEnumeration<?> namingEnum = ctx.search("", "(uid=SUser)", getSimpleSearchControls());
while (namingEnum.hasMore())
{
namingEnum.next();
}
namingEnum.close();
}

private Control[] getRequestControls() throws IOException
{
final ArrayList<control> requestControls = new ArrayList<control>();
requestControls.add(new ProxiedAuthorizationControl("dn:uid=PUser,o=Test.com"));
return requestControls.toArray(new Control[0]);
}

public void run() throws NamingException, IOException
{
final LdapContext ldapContext = getLdapContext().newInstance(getRequestControls());
for (int i=0; i<10000; i++)
{
search(ldapContext);
}
}

public static void main(final String[] args)
{
final LoadTest loadTest = new LoadTest();
try
{
loadTest.run();
}
catch (final Exception e)
{
e.printStackTrace();
}
}
}

}}}

I've just ran the above through Valgrind, and there's some relevant-looking stack traces - I hope that helps:

==14241== 85,260 bytes in 21 blocks are possibly lost in loss record 1,575 of 1,582
==14241== at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==14241== by 0x5EDCFCB: ber_memalloc_x (memory.c:226)
==14241== by 0x5EDC35C: ber_realloc (io.c:169)
==14241== by 0x5EDC477: ber_write (io.c:119)
==14241== by 0x5EDC555: ber_init (io.c:362)
==14241== by 0x4EC7D4A: proxyauth_get_dn (proxyauth.c:113)
==14241== by 0x4EB59D7: op_shared_search (opshared.c:313)
==14241== by 0x426590: do_search (search.c:355)
==14241== by 0x414319: connection_threadmain (connection.c:622)
==14241== by 0x6A929A5: _pt_root (ptthread.c:204)
==14241== by 0x70CB9D0: start_thread (pthread_create.c:301)
==14241== by 0x73C9B6C: clone (clone.S:115)
==14241==
==14241== 40,514,740 bytes in 9,979 blocks are definitely lost in loss record 1,581 of 1,582
==14241== at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==14241== by 0x5EDCFCB: ber_memalloc_x (memory.c:226)
==14241== by 0x5EDC35C: ber_realloc (io.c:169)
==14241== by 0x5EDC477: ber_write (io.c:119)
==14241== by 0x5EDC555: ber_init (io.c:362)
==14241== by 0x4EC7D4A: proxyauth_get_dn (proxyauth.c:113)
==14241== by 0x4EB59D7: op_shared_search (opshared.c:313)
==14241== by 0x426590: do_search (search.c:355)
==14241== by 0x414319: connection_threadmain (connection.c:622)
==14241== by 0x6A929A5: _pt_root (ptthread.c:204)
==14241== by 0x70CB9D0: start_thread (pthread_create.c:301)
==14241== by 0x73C9B6C: clone (clone.S:115)
==14241==
==14241== 40,571,580 bytes in 9,993 blocks are definitely lost in loss record 1,582 of 1,582
==14241== at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==14241== by 0x5EDCFCB: ber_memalloc_x (memory.c:226)
==14241== by 0x5EDC35C: ber_realloc (io.c:169)
==14241== by 0x5EDC477: ber_write (io.c:119)
==14241== by 0x5EDC555: ber_init (io.c:362)
==14241== by 0x4EC7D4A: proxyauth_get_dn (proxyauth.c:113)
==14241== by 0xEFD8625: aclplugin_preop_common (aclplugin.c:225)
==14241== by 0xEFD87CD: aclplugin_preop_search (aclplugin.c:165)
==14241== by 0x4EBF819: plugin_call_func (plugin.c:1453)
==14241== by 0x4EBFA4E: plugin_call_plugins (plugin.c:1415)
==14241== by 0x4EB6104: op_shared_search (opshared.c:565)
==14241== by 0x426590: do_search (search.c:355)
==14241== by 0x414319: connection_threadmain (connection.c:622)
==14241== by 0x6A929A5: _pt_root (ptthread.c:204)
==14241== by 0x70CB9D0: start_thread (pthread_create.c:301)
==14241== by 0x73C9B6C: clone (clone.S:115)

Note that parse_LDAPProxyAuth calls ber_free(ber, 0 / do not free ber->ber_buf /) but almost every other place in the server code where ber_free is called it is called like this:
ber_free(ber, 1 / free ber->ber_buf /)

The only places where ber_free(ber, 0) is used is when the ber is used to parse ldap attribute/value structures, where the ldap code has access to the internal ber structures, and the ber is just a wrapper for parsing purposes.

If this turns out to be the source of the memory leaks, we should check other places where ber_free(ber, 0) is used.

I just tried changing ber_free(ber, 0) to ber_free(ber, 1) in parse_LDAPProxyAuth (in 1.3.2.6 built from source - which was also showing the problem).

That is the answer - the load test now shows a completely flat memory pattern. Great find!

I was able to reproduce the leak with your script, thank you very much, and I was able to verify the fix. All the other ber_free(ber, 0) are valid. It was only the one in proxyauth.c that was incorrect.

git merge ticket47743
Updating dc17d63..e4a1de6
Fast-forward
ldap/servers/slapd/proxyauth.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

git push origin master
dc17d63..e4a1de6 master -> master

commit e4a1de6
Author: Mark Reynolds mreynolds@redhat.com
Date: Tue Mar 18 16:08:19 2014 -0400

7b464ad..cdffb52 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

c2f1be4..97ed029 389-ds-base-1.3.1 -> 389-ds-base-1.3.1

09e70fc..fd954a7 389-ds-base-1.3.0 -> 389-ds-base-1.3.0

b46e1fd..872b7d9 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

Metadata Update from @chatfield:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.2.11.29

2 years ago

Login to comment on this ticket.

Metadata