#49780 acl_copyEval_context double free
Opened 6 months ago by lkrispen. Modified 5 months ago

Issue Description

slapd crashes with the following or similar stack trace:

 #0  acl_copyEval_context (aclpb=0x0, src=0x4fc5730, dest=0x7f2d00452bf0, copy_attr_only=0) at ldap/servers/plugins/acl/acl.c:3661
 #1  0x00007f2d3533df71 in acl_operation_ext_destructor (ext=0x4fc5650, object=<value optimized out>, parent=<value optimized out>)
at ldap/servers/plugins/acl/acl_ext.c:328
 #2  0x000000368a0637da in factory_destroy_extension (type=<value optimized out>, object=0x30fc7f0, parent=0x7f2d24b825d0, extension=0x30fc8a8)
at ldap/servers/slapd/factory.c:405
 #3  0x000000368a08c16e in operation_free (op=0x3009480, conn=0x7f2d24b825d0) at ldap/servers/slapd/operation.c:220
 #4  0x000000368a095ef8 in pblock_done (pb=0x3009470) at ldap/servers/slapd/pblock.c:114
 #5  0x000000368a095f33 in slapi_pblock_destroy (pb=0x3009470) at ldap/servers/slapd/pblock.c:125
 #6  0x00000000004140b3 in connection_threadmain () at ldap/servers/slapd/connection.c:2382
 #7  0x0000003693c29c53 in _pt_root (arg=0x32c7c90) at ../../../nspr/pr/src/pthreads/ptthread.c:216
 #8  0x0000003688007aa1 in start_thread (arg=0x7f2d0cbed700) at pthread_create.c:301
 #9  0x0000003687ce8bcd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:122
 #10 0x0000000000000000 in ?? ()

Package Version and Platform

crash was reported for 389-ds-base-1.2.11.15-91.el6_9.x86_64

but code is unchanged, so new versions could be affected as well

Steps to reproduce

No reproducer available

Actual results

crash

Expected results

no crash


checking the two available core files Ĺ•evealed that the last operations included getEffectivRights (GER) searches
and in GER we do replace the aclcb in the connection extension

In fact we do:

 _ger_new_gerpb()
        geraclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
       *aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
        acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)geraclcb);

which creates an connection extension and sets it to the connection
and later we do:

 _ger_release_gerpb()
       geraclcb = (struct aclcb *)acl_get_ext(ACL_EXT_CONNECTION, conn);
        acl_conn_ext_destructor(geraclcb, NULL, NULL);
        acl_set_ext(ACL_EXT_CONNECTION, conn, *aclcb);

which destroys the aclcb in the connection and then sets the saved aclcb back
Now, this is definitely the wrong order, but changing the order of restoring the original and destroying the temporary will not be sufficient, an other thread could still hold a reference to the aclcb which will be destroyed.

I see two options to fix this:
1] Investigate why GER needs its own specific aclcb in the connection and if this can be avoided, optimal solution
2] do not blindly get and set connection extensions, but always use the lock in aclcb during access to it. Then we could lock it, copy its context to a backup aclpb and when done, under the lock copy it back

Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 months ago

Metadata Update from @lkrispen:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1563539

6 months ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.2.11

6 months ago

To be honest this part of code is so complex that I am not sure I understand it correctly :(

First evaluation context looks to be a key element of right evaluation. Considering that the GER can evaluate the rights of a proxy user to an entry/attri, it looks normal we need an evaluation context different from the one used by the bound user. I do not understand the need of changing the aclcb but rather could it be done via acl_clean_aclEval_context (in _ger_new_gerpb and _ger_release_gerpb).

Now with asynchronous operations, the evaluation context could be cleaned for a GER operation, if an other operation is evaluation rights in parallel, I not sure which prevent the two evaluations to be merged.

I checked the use of ACL_EXT_CONNECTION.
As far as I can see it is only used in acl_init_aclpb() and only if the parameter "copy_from_aclcb" is set.

In _ger_new_gerpb() we call

 acl_init_aclpb(*gerpb, geraclpb, subjectndn, 0);

so it will get the conn exstension, but not use it. I don't see the need of the special gercb

ACL_EXT_CONNECTION is also used during operation desctructor. It looks to me that each time an operation completes, part of the evaluation context in the connection is cleared/reset.

In _ger_new_gerpb, it sets a new gercb (in the connection) but with the bind sdn ('subjectndn') specified in the GER control. So I believe it needs this specific gercb to evaluate rights of the specified DN not of the connection bound DN.

ACL_EXT_CONNECTION is also used during operation desctructor. It looks to me that each time an operation completes, part of the evaluation context in the connection is cleared/reset.

yes. it copies the context from the operation to the connection, but this would not be affected if we do not set the aclcb, since it should already be reset when the destructor is called

In _ger_new_gerpb, it sets a new gercb (in the connection) but with the bind sdn ('subjectndn') specified in the GER control. So I believe it needs this specific gercb to evaluate rights of the specified DN not of the connection bound DN.

but it is set, to the aclpb in acl_init_aclpb, the aclcb is not used as far as I see

I finally decided not to take the risk of just not using the eval context in GER, I still believe it is not used, but there are several places where it is touched, copied, ....

So the proposed solution is:
- always keep one acl connection extension, do not set/unset it with the GER extension
- when a GER extension is needed create one, lock the existing extension and swap the data
This has the same effect as setting and unsetting the GRER extension to the connection, but the main extension will never be freed in between, only locked and data modified.

Unfortunately testing this I did run into another ACL issue.
The connection extension has a lock, this lock is set when a new connection extension is created (and in GER it will be created frequently). But these locks are taken from a pre alloacted array of locks and this array is used in a circular mode in aclext_get_lock() and so it can happen that two extensions do use the same lock. The size of the lock array is 40.

This is really bad, and just to avoid the call to PR_NewLock() and PR_DestroyLock() for each connection extension. I would like to remove the use of the array and always create a lock wneh allocating the connection exdtension

here is my suggested patch

0001-Ticket-49780-acl_copyEval_context-double-free.patch

Well I tried to review this looking for the mishandling of a lock (as mention in a previous meeting). I don't fully understand all of the GER code, but I wondering if we need to reset the extension after the swap?

@@ -318,14 +354,17 @@ _ger_new_gerpb(
         slapi_pblock_set(*gerpb, SLAPI_CONNECTION, conn);

         /* Can't share the conn->aclcb because of different context */
-        geraclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
-        if (geraclcb == NULL) {
+        ger_aclcb = (struct acl_cblock *)acl_conn_ext_constructor(NULL, NULL);
+        if (ger_aclcb == NULL) {
             rc = LDAP_NO_MEMORY;
             goto bailout;
         }
-        slapi_sdn_set_ndn_byval(geraclcb->aclcb_sdn, subjectndn);
-        *aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
-        acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)geraclcb);
+        slapi_sdn_set_ndn_byval(ger_aclcb->aclcb_sdn, subjectndn);
+        conn_aclcb = acl_get_ext(ACL_EXT_CONNECTION, conn);
+        PR_Lock(conn_aclcb->aclcb_lock);
+        _ger_swap_aclcb(conn_aclcb, ger_aclcb);
+        *save_aclcb = ger_aclcb;
+        PR_Unlock(conn_aclcb->aclcb_lock);
     }

After the unlock should we do?

acl_set_ext(ACL_EXT_CONNECTION, conn, (void *)ger_aclcb);

In the old code it was always reset, but maybe that's not needed with your new design.

In the old code it was always reset, but maybe that's not needed with your new design.

well, that is what the new code also is doing, at least it is my intent. Just in a different way.

In the old code we had a start of GER
- allocate and init a new cblock: C_GER
- get the current cblock from the pblock/connection: C_CONN
- set C_GER to the connection

and at the finis of GER
- free C_GER
- set C_CONN to the connection.

But this had no control if another threadhad a reference to C_GER and could access freed data.

My approach is to not free anything that could be seen outside the actual GER:

at begin of GER

  • allocate and init C_GER
  • get the cblock from the connection: C_CONN
  • lock C_CONN
  • swap the data of C_GER and C_CONN (this should have tehsame effect as setting C_GER to teh connection)
  • unlock C_CONN

at end of ger
- get C_CONN
- lock C_CONN
- swap data of C_GER and C_CONN (restting the old data to teh pblock)
- destroy C_GER

This approach tries to keep the old behaviour and to avoid potential crashes.
It doe not address the problem of what concurrent operations should see and set.

That is an excellent fix. This ticket being to prevent the crash, the fix will address the ticket. The problem of concurrent operations sharing inappropriate value from the cblock existed before and with your fix. This problem would need a different ticket.

I have created ticket 49826 for the general investigation

ack from me for current fix

To ssh://pagure.io/389-ds-base.git
8fa838a..5537467 master -> master

To ssh://pagure.io/389-ds-base.git
8720370..df357f5 389-ds-base-1.3.8 -> 389-ds-base-1.3.8

To ssh://pagure.io/389-ds-base.git
855d78b..c907bca 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

Can you also cherry-pick this to 1.2.11 (for 6.10)? Thanks!

backported to 1.3.6

To ssh://pagure.io/389-ds-base.git
b1c3707..b45afc2 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

To ssh://pagure.io/389-ds-base.git
f3f5d28..8519912 389-ds-base-1.2.11 -> 389-ds-base-1.2.11

Login to comment on this ticket.

Metadata