On a frequent basis, ldapcompare operations against the pwdpolicySubentry attribute will hang. The access log shows the CMP operation, but no corresponding result. Once this has occurred, a second attempt at the same ldapcompare command will cause the server to crash. Even if the server does not crash, it cannot be shut down cleanly.
This problem was first observed with 220.127.116.11, but also occurs with
18.104.22.168. The problem is most easily reproducible after reinitializing
the server, but can occur at any time.
The server is one of a pair of servers configured for multimaster
replication. Both servers are running on RHEL 6.2.
Detailed description of the initial problem
A simplified and redacted version of the class-of-service configuration
gdb analysis of 22.214.171.124 during ldapcompare hang
ns-slapd coredump from 126.96.36.199
it is blowing up during search operations
This looks like it is blowing up during search operations too
Bug appears to be directly related to my adding a second pwpolicy today. After deleting the policy, my crashes stopped.
Looking at both the hang and crash stack traces, we can see that we have two threads in cos_cache_query_attr(). Both the hang and the crash are happening, at line 2393, while doing a free operation(double free). Instead of allocating a new normalized dn for the targetTree, freeing the current targetTree(which is shared data), and reassigning the new value - I am just modifying the existing pointer.
Previously I was able to crash the server with less than 10 concurrent searches. I have now run the new code through over 1.25 million searches without issue.
Sending fix out for review...
slapi_dn_normalize_original is deprecated - the problem with doing DN normalization is that you cannot be guaranteed that you can always do it in place - converting to certain escape sequences will cause the string to grow. It looks like the real problem here is locking - there should be no way that another thread can free or change pTargetTree->val out from under the current thread. If the real problem is locking, then you could still run into weird problems if one thread is normalizing the DN out from under another thread - there could be odd characters in the DN that would cause strange errors at runtime.
I wonder if we should even be normalizing at that point, as we are potentially normalizing the same DN string multiple time. There are also the thread safety issues Rich points out where the DN can be modified by one thread while another is reading it. It seems like we should only normalize once when it is added to the CoS cache. Perhaps that portion of the code is already protected by locking as well.
I didn't see any errors/problems with the previous fix, but I did had concerns. Moved the dn normalization to the cache building code. This is built under a lock. Sending new fix out for review...
[mareynol@localhost plugins]$ git merge ticket305
ldap/servers/plugins/cos/cos_cache.c | 24 ++++++++++--------------
1 files changed, 10 insertions(+), 14 deletions(-)
[mareynol@localhost plugins]$ git push origin master
Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 934 bytes, done.
Total 7 (delta 5), reused 0 (delta 0)
6fd5d70..142c8f0 master -> master
stack trace of hung server with 188.8.131.52
Sorry I found the compare in the "initial problem" attachment. Continuing investigation...
3f960dc..55135e3 master -> master
Author: Rich Megginson email@example.com
Date: Tue Mar 13 11:45:32 2012 -0600
Added initial screened field value.
Metadata Update from @imorgan:
- Issue assigned to rmeggins
- Issue set to the milestone: 184.108.40.206
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here:
If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)
to comment on this ticket.