#4079 Add TCP level timeout to LDAP services
Closed 4 years ago by pbrezina. Opened 5 years ago by simo.
SSSD/ simo/sssd tcp_timeout  into  master

file modified
+11
@@ -79,6 +79,7 @@ 

      int dummy = 1;

      int ret;

      struct timeval tv;

+     unsigned int milli;

  

      /* SO_KEEPALIVE and TCP_NODELAY are set by OpenLDAP client libraries but

       * failures are ignored.*/
@@ -117,6 +118,16 @@ 

                    "setsockopt SO_SNDTIMEO failed.[%d][%s].\n", ret,

                    strerror(ret));

          }

+ 

+         milli = timeout * 1000; /* timeout in milliseconds */

+         ret = setsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, milli,

+                          sizeof(milli));

+         if (ret != 0) {

+             ret = errno;

+             DEBUG(SSSDBG_FUNC_DATA,

+                   "setsockopt TCP_USER_TIMEOUT failed.[%d][%s].\n", ret,

+                   strerror(ret));

+         }

      }

  

      return EOK;

In some cases the TCP connection may hang with data sent because
of network conditions, this may cause the socket to stall for much
longer than the timeout intended.
Set a TCP option to forcibly timeout a socket that sees its data not
ACKed within the ldap_network_timeout seconds.

Just an idea (haven't made a build yet) to address an issue discussed on IRC with dwmw2

Hi Simo,

thanks for the patch.

Do I understand correctly, that this patch fixes handles the case which are not handled by SO_RCVTIMEO and SO_SNDTIMEO namely timeouts when SSSD is waiting in select() or epoll()?

bye,
Sumit

From what I have read about those options, patch looks good. But I didn't test this yet.

From what I have read about those options, patch looks good. But I didn't test this yet.

Do you have an idea how to test it?

From what I have read about those options, patch looks good. But I didn't test this yet.

Do you have an idea how to test it?

Not yet, I am thinking about it. We need a connection with data sent but not acked (this will prevent SO_KEEPALIVE to play a role).

Btw, I am not sure I understand a reason behind SO_RCVTIMEO & SO_SNDTIMEO (those are already there) because those affects only read/write/connect (to some extent) but socket is set O_NONBLOCK anyway...

Btw, I am not sure I understand a reason behind SO_RCVTIMEO & SO_SNDTIMEO (those are already there)

The reason for those was: https://github.com/SSSD/sssd/pull/835/commits/4b44ecf64e34ace624267e2aaef775e745eb618b

For the reference: https://groups.google.com/forum/#!topic/mailing.openssl.users/6ECIBl-9U8w

why does the process get hung on __read_nocancel (), when the
connection is set to non-blocking

and

>> SO_RCVTIMEO and SO_SNDTIMEO to 60 seconds on the socket.
> Why? You're using non-blocking operations. Fortunately, this will do nothing
> because these timeouts only affect blocking operations

So it is not only me who wonders what SO_RCVTIMEO / SO_SNDTIMEO has to do with O_NONBLOCK sockets...

So... I am not sure about the case that TCP_USER_TIMEOUT is intended to cover,

but I really wonder:
1) why SO_RCVTIMEO/SO_SNDTIMEO were introduced in PR835 to handle #2878 instead of solution proposed in https://bugzilla.redhat.com/show_bug.cgi?id=1471039#c42
2) why does SO_RCVTIMEO/SO_SNDTIMEO affect O_NONBLOCK socket?

@thalman, do you remember by any chance?

(btw, I do not mean to blame anybody; after all I did review of PR835 myself :) )

Hi,

this patch was used together with the SO_RCVTIMEO/SO_SNDTIMEO in an environment with intermittent network issues. This helped to avoid that SSSD got stuck in a read() system call. Unfortunately it is currently not clear if one of those setting would be sufficient. Since there was no negative impact observed I give ACK to this patch.

In general I'd prefer to have a better understanding about the cases where each of the parameters become important. So it would be nice to do more investigations in this area, but this should not prevent us from including the patch which might help with issues which might be otherwise hard to debug.

bye,
Sumit

  • master
    • 7aa9645 - Add TCP level timeout to LDAP services
  • sssd-1-16
    • bad7c63 - Add TCP level timeout to LDAP services

Pull-Request has been closed by pbrezina

4 years ago
Metadata