#51036 Ticket 51035 - Heavy StartTLS connection load can randomly fail with err=1
Closed 3 years ago by spichugi. Opened 4 years ago by tbordaz.
tbordaz/389-ds-base ticket_51035  into  master

@@ -150,8 +150,8 @@ 

          /* We can't get rid of this til we remove the operation stack. */

          memset(o, 0, sizeof(Slapi_Operation));

          o->o_ber = ber;

-         o->o_msgid = -1;

-         o->o_tag = LBER_DEFAULT;

+         o->o_msgid = -1;         /* if changed please update start-tls that test this value */

+         o->o_tag = LBER_DEFAULT; /* if changed please update start-tls that test this value */

          o->o_status = SLAPI_OP_STATUS_PROCESSING;

          slapi_sdn_init(&(o->o_sdn));

          o->o_authtype = NULL;

@@ -188,11 +188,31 @@ 

      /* Check whether the Start TLS request can be accepted. */

      if (connection_operations_pending(conn, pb_op,

                                        1 /* check for ops where result not yet sent */)) {

-         slapi_log_err(SLAPI_LOG_PLUGIN, "start_tls",

-                       "Other operations are still pending on the connection.\n");

-         ldaprc = LDAP_OPERATIONS_ERROR;

-         ldapmsg = "Other operations are still pending on the connection.";

-         goto unlock_and_return;

+         for (Operation *op = conn->c_ops; op != NULL; op = op->o_next) {

+             if (op == pb_op) {

+                 continue;

+             }

+             if ((op->o_msgid == -1) && (op->o_tag == LBER_DEFAULT)) {

+                 /* while processing start-tls extop we also received a new incoming operation

+                  * As this operation will not processed until start-tls completes.

+                  * Be fair do not consider this operation as a pending one

+                  */

+                 slapi_log_err(SLAPI_LOG_CONNS, "start_tls",

+                               "New incoming operation blocked by start-tls, Continue start-tls (conn=%"PRIu64").\n",

+                               conn->c_connid);

+                 continue;

+             } else {

+                 /* It is problematic, this pending operation is processed and

+                  * start-tls can push new network layer while the operation

+                  * send result. Safest to abort start-tls

+                  */

+                 slapi_log_err(SLAPI_LOG_CONNS, "start_tls",

+                               "Other operations are still pending on the connection.\n");

+                 ldaprc = LDAP_OPERATIONS_ERROR;

+                 ldapmsg = "Other operations are still pending on the connection.";

+                 goto unlock_and_return;

+             }

+         }

      }

  

  

Bug Description:
startTls pushes a network layer on top of the connection.
So when processing startTLS, there should not be a pending operation
else there is a risk that the operation sends back data on moving
network layer.
When startTls detects a pending operation it aborts startTls.
However if a new operation is received while processing startTls,
the operation is pending but can not be read because startTls
holds c_mutex.

Fix Description:
In case of unread pending operation, relax the control
and just log an information message.

https://pagure.io/389-ds-base/issue/51035

Reviewed by: ?

Platforms tested: F30

Flag Day: no

Doc impact: no

How often could this message be logged? I wonder if it should be SLAPI_LOG_CONN ? And I wonder if the logging call below should also be CONN and not PLUGIN. Thoughts?

Yes, it should be conn, not plugin IMO

rebased onto 5990de8ace882ad50954e11f22685e6c55d0a382

3 years ago

@mreynolds , @firstyear thanks for the reviews. I agree that it makes sense to use connection level rather than plugin. I initially used 'info' because it helps for test purpose and also the failure was quite rare. However, even if it is rare it would be erroneously alarming for administrator. I changed the PR accordingly.

What is up with this include? Looks wrong to me :-)

rebased onto e6023ca

3 years ago

ooops .. I have no idea why this include jumped into the patch :)
I removed it

Ack to your patch, but I wonder if we should change all the logging in ldap/servers/slapd/start_tls_extop.c from PLUGIN to CONN? Could do it in a separate issue, but then it might get lost in the mix :-)

Pull-Request has been merged by tbordaz

3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This pull request has been cloned to Github as issue and is available here:
- https://github.com/389ds/389-ds-base/issues/4089

If you want to continue to work on the PR, please navigate to the github issue,
download the patch from the attachments and file a new pull request.

Thank you for understanding. We apologize for all inconvenience.

Pull-Request has been closed by spichugi

3 years ago