#3386 sssd-kcm cannot handle big tickets
Opened 2 years ago by lslebodn. Modified 12 days ago

If there is a ticket stored in kcm with many service principals then it can easily reach hardcoded limit 2KiB.

a) the hardcoded limit does not make sense because It is configurable in sssd-secrets (man sssd-secrets -> max_payload_size)
b) limit is too low and therefore unusable in real environment. (it's 8 times lower then default limit in sssd-secrets

sh$ klist 
klist: Credentials cache I/O operation failed
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [sec_get_send] (0x2000): Retrieving ccache bd874c90-32c6-4e47-8b5a-d0935800d706-1000:80312
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/1000/ccache/bd874c90-32c6-4e47-8b5a-d0935800d706-1000:80312, at socket /var/run/secrets.socket
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data (nil)
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 5000
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [tcurl_write_data] (0x2000): ---> begin libcurl data
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [tcurl_write_data] (0x2000): {
    "version": 1,
    "kdc_offset": 0,
    "principal": {
        "type": 0,
        "realm": "REDHAT.COM",
        "components": [
            "lslebodn"
        ]
    },
    "creds": [
        {
            "uuid": "25d3b632-ee4f-4a0c-ab72-952861bf68b9",
            "payload": "AAAAAQAAAAEAAAKKUkVESEFULkNPTQAAAAhsc2xlYm9kbgAAAAMAAAACAAAAClJFREhBVC5DT00AAAAESFRUUAAAACVqaXJhLTAxLmhvc3QucHJvZC5lbmcucmR1Mi5yZWRoYXQuY29tABIAAAAgPXMXnJjF+r50Vbtr0fkFgwe4DPDnVYsYkgLym1GXqt1Y6yzaWOtnsVjruXoAAAAAAEApAAAAAAAAAAAAAAAAAW9hggFrMIIBZ6ADAgEFoQwbClJFREhBVC5DT02iODA2oAMCAQOhLzAtGwRIVFRQGyVqaXJhLTAxLmhvc3QucHJvZC5lbmcucmR1Mi5yZWRoYXQuY29to4IBFjCCARKgAwIBEqEDAgEDooIBBASCAQCxSd/sQO0J8B6c+4LrRxXltt+fSLsGqFfvOjgyHREQpODAREfaqDL5AgsIC7kmKaTE42Tn9T27TSnuLFYD3yUP8Z2uicaGbFRCcbMPH5DR464RHq6YsyID2Z2yPBnsJf6p05T6VuzL+orM2/TvbRF14MDA45yXmKitrv5ALV9PERE8RNuTiGbvBWmAWqkOjuvpvpSXaCBoBvUlA11QslllON/rCPYBqrWllowkZv6dCBaDuUsx9SJKcdijNjWsttJwEanN92ymUbOthGHDBVKriG6Hygjpe2lev0FvUCTaBx71yO5FvwtP4GBnVcp3wTQ1O3reDNa3bi6Ey0nB66GKAAAAAA=="
        },
        {
            "uuid": "de7084a4-5153-4f76-9b68-82f41090c130",
            "payload": "ABAAAAQAAAAEAAAAKUkVESEFULkNPTQAAAAhsc2xlYm9kbgAAAAMAAAACAAAAClJFREhBVC5DT00AAAAESFRUUAAAACplcnJhdGEtd2ViLTAxLmhvc3QucHJvZC5lbmcuYm9zLnJlZGhhdC5jb20AAgAAACDiBq1YuOGzE2wkvy4MadkP+buzQYoNYWmpDLtkJ3CffFjrLNpY6y3sWOu5egAAAAAAQCkAAAAAAAAAAAAAAAABdGGCAXAwggFsoAMCAQWhDBsKUkVESEFULkNPTaI9MDugAwIBA6E0MDIbBEhUVFAbKmVycmF0YS13ZWItMDEuaG9zdC5wcm9kLmVuZy5ib3MucmVkaGF0LmNvbaOCARYwggESoAMCARKhAwIBBKKCAQQEggEATQXGnOJbTfEChzl1UBnA4UJJFWQAoRNFVPr7TCGZVxy+uCP606flsk6rLAlWKCgb9PFMycl+YYigHsrhppHUsXGgXLz/8QyAbxheHHhCwgaRVrNNGf2OvPUZwy/WiPdqZsVSvBcd/PCeS2kt39yFfwvwUrfxueVP6I3keViHTzivEOJI6m8dvmhM+h+Md4pAFP7g2+ZrUMuN2KI4iorsqio8XVm0lKAPuWfaqVTO2lkhGVni2L1eZj6KT6PSQeHbinbMDyI7+7p1lOKJERfZRbWcysN31+YcoSXUw473GZvxYzEjqeI7sEOuACxfIfhnhgHJj2WsvVuGr1/FPXUB7AAAAAA="
        },
        {
            "uuid": "a075810a-8211-45d1-bfc9-44eba01dad1d",
            "payload": "ACAAAAAAAACAEAAAAKUkVESEFULkNPTQAAAAhsc2xlYm9kbgAAAAAAAAADAAAADFgtQ0FDSEVDT05GOgAAABVrcmI1X2NjYWNoZV9jb25mX2RhdGEAAAAHcGFfdHlwZQAAABxrcmJ0Z3QvUkVESEFULkNPTUBSRURIQVQuQ09NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABMgAAAAA="
        },
        {
            "uuid": "94033c35-6aa6-4af7-b81c-085e8a475aae",
            "payload": "AACAAAQAAAAEAAAAKUCkVESCEFULkNPTQAAAAhsc2xlYm9kbgAAAAIAAAACAAAAClJFREhBVC5DT00AAAAGa3JidGd0AAAAClJFREhBVC5DT00AEgAAACBiOdJZoKwcCxiDCeC9pe/cw6Wz/axuoxwQJqa8tsxbgVjrLNpY6yzaWOu5egAAAAAAQGEAAAAAAAAAAAAAAAABQGGCATwwggE4oAMCAQWhDBsKUkVESEFULkNPTaIfMB2gAwIBAqEWMBQbBmtyYnRndBsKUkVESEFULkNPTaOCAQAwgf2gAwIBEqEDAgECooHwBIHt5dW+TGBvQ8jrUfVprar8f2bkgwFJc/rmYs+YtrOSZeoq/82GNB34LsCoWUqt1XPDms85xSmj+C1amK7rwgxsvHddGA5ocln9RQFLwkMQuTRbLh5qjgrkJ8lfUkUPaeDXkHJgKpoDC+Vwjxlb+CtTjieeCE/QPe7ha+h7tRWqa2Tp7IpTbMuiQrymVyRwydSt9veP2bp0FXhwK/NI3q2lgWBM6cchRFMh92gORYJCOO89Td48os+wwalgwR33llQOlg+eJT6nUjir/FEYz5x43jwbsu5Ujyv702DiexleaXcvImEhd3/iBt73r0VrAAAAAA=="
        },
        {
            "uuid": "475e9bec-9dc7-44e8-bcc8-dd65ceeda61f",
            "payload": "AAAAAACAAAAEAAAAKUkVESCEFULkNPTQAAAAhsc2xlYm9kbgAAAAAAAAADAAAADFgtQ0FDSEVDT05GOgAAABVrcmI1X2NjYWNoZV9jb25mX2RhdGEAAAAKZmFzdF9hdmFpbAAAABxrcmJ0Z3QvUkVESEFULkNPTUBSRURIQVQuQ09NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADeWVzAAAAAA=="
        },
        {
            "uuid": "6871fd43-1588-4673-8bdf-5054240af2b0",
            "payload": "AAAACAQAAAAEAAAAKCUkVESEFULkNPTQAAAAhsc2xlYm9kbgAAAAMAAAACAAAAClJFREhBVC5DT00AAAAEaW1hcAAAABRtYWlsLmNvcnAucmVkaGF0LmNvbQASAAAAIAKDi/YxzECI4zIgzcjswH6d8loUyZrZWPfPglzj5L+KWOss2ljrLOZY67l6AAAAAABAKQAAAAAAAAAAAAAAAAFeYYIBWjCCAVagAwIBBaEMGwpSRURIQVQuQ09NoicwJaADAgEDoR4wHBsEaW1hcBsUbWFpbC5jb3JwLnJlZGhhdC5jb22jggEWMIIBEqADAgESoQMCAQWiggEEBIIBABH+e46XiXyhudhNUUzRwqTRHjQ38dnNBhp+8vSUeVfIPI6TmneGIN1/SSv1dy9Fx0ezLiliBoupnWcDGcd9YlKvuiubO6b7OJtbx61RKtGjYIrGEs+16s3VbRzp8FrtSxLgP9FKyY0ZzBbWHMpJbchrYY9CqLWycNrzsw3pODioDgQKBMg3Xvko9fQfnzilhzI4DHLjTlUBhmtfpHnbyvN8yvdFurlQwyvSJorbejivKgPZo4CypgPoh2SLIV6UnAyoVzXox1zppPcACgzHcjAS24Qb3sUVGwOR9oEQkT9ALtSUZZJU0fttjN/0lIAJ1wc6/9vFiv+Fu1Q9N/PrzaYAAAAA"
        },
        {
            "uuid": "749f83e7-775d-47cf-89e9-2cd34ec71340",
            "payload": "AAAACAQAAAAEAAAACKUkVECSEFULkNPTQAAAAhsc2xlYm9kbgAAAAMAAAACAAAAClJFREhBVC5DT00AAAAESFRUUAAAAA9zYW1sLnJlZGhhdC5jb20AEgAAACBrQ2j3VL+lgWXgPgabgbY+jucrQmpCbSrKutfGiIaGaVjrLNpY6zQRWOu5egAAAAAAQCkAAAAAAAAAAAAAAAABWWGCAVUwggFRoAMCAQWhDBsKUkVESEFULkNPTaIiMCCgAwIBA6EZMBcbBEhUVFAbD3NhbWwucmVkaGF0LmNvbaOCARYwggESoAMCARKhAwIBBKKCAQQEggEALisqz2OQaRp1K497OOQDr50kJ4NGslyiQo1MfFBBB7FnOXXILaN3UkyN88mX+PxIcbMzVgTWf7BJBItZLMdi1t0WAkto4NiAGLoqfC0Hz3SlNrgD3jE9Al5f0kxVbq9IhuXV3ENZ05l/Ml8fPR3RKLXFa+LMLM0DQal3R8jQAmnGaSl/4iGNntpd1+XQNfUUcemPDWZjqJk0HSjYtx/xj3DStnHlP9pZTUBeCX25nvikZBq9F50dUebVIuwh7I7r9UQk8lPgqy0+bOCeYjlmEkowPJts91QHb1kxqV+wmVCMTSKQmtAGPsU6gz4tR9nb32zxthfrphyn1cA+YWocQwAAAAA="
        }
    ]
}
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [tcurl_write_data] (0x2000): <--- end libcurl data
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [tcurl_write_data] (0x0020): Failed to write data to buffer [105]: No buffer space available
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 13 socket data 0x55a058100100
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/1000/ccache/bd874c90-32c6-4e47-8b5a-d0935800d706-1000:80312
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0040): CURL operation failed [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [sec_get_done] (0x0040): GET HTTP request failed [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [sec_get_ccache_done] (0x0040): Cannot resolve key to ccache [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [ccdb_sec_getbyuuid_done] (0x0040): Cannot retrieve the ccache [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_ccdb_getbyuuid_done] (0x0040): Failed to get cache by UUID [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_op_get_cache_by_uuid_done] (0x0040): Cannot get ccahe by UUID [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_cmd_done] (0x0040): op receive function failed [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_cmd_request_done] (0x0040): KCM operation failed [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_reply_error] (0x0040): KCM operation returs failure [5]: Input/output error
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_failbuf_construct] (0x1000): Sent reply with error -1765328191
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), aborting client
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55a0580fd4c0][14]
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [kcm_recv] (0x4000): Client closed connection.
(Mon Apr 10 13:25:16 2017) [sssd[kcm]] [client_close_fn] (0x2000): Terminated client [0x55a0580f4030][12]

Hmm, probably there should not be any hardcoded limit in tcurl but pass it from the caller as parameter.

Agree, It is hardcoded on few places

src/responder/kcm/kcmsrv_ops.c:110:    state->reply = sss_iobuf_init_empty(state,
src/responder/kcm/kcmsrv_ops.c-111-                                        KCM_REPLY_MAX,
src/responder/kcm/kcmsrv_ops.c-112-                                        KCM_REPLY_MAX);
src/responder/kcm/kcmsrv_ops.c-113-    if (state->reply == NULL) {
src/responder/kcm/kcmsrv_ops.c-114-        ret = ENOMEM;

src/responder/kcm/kcmsrv_ops.c:153:    state->op_ctx->reply = sss_iobuf_init_empty(
src/responder/kcm/kcmsrv_ops.c-154-                                        state,
src/responder/kcm/kcmsrv_ops.c-155-                                        KCM_REPLY_MAX - 2*sizeof(uint32_t),
src/responder/kcm/kcmsrv_ops.c-156-                                        KCM_REPLY_MAX - 2*sizeof(uint32_t));
src/responder/kcm/kcmsrv_ops.c-157-    if (state->reply == NULL) {

src/util/tev_curl.c:655:    state->response = sss_iobuf_init_empty(state, TCURL_IOBUF_CHUNK, TCURL_IOBUF_MAX);
src/util/tev_curl.c-656-    if (state->response == NULL) {
src/util/tev_curl.c-657-        ret = ENOMEM;
src/util/tev_curl.c-658-        goto done;
src/util/tev_curl.c-659-    }

I use sssd-kcm with remporary workaround:

diff --git a/src/responder/kcm/kcmsrv_ops.c b/src/responder/kcm/kcmsrv_ops.c
index 2feaf51f2..7a78e9d6b 100644
--- a/src/responder/kcm/kcmsrv_ops.c
+++ b/src/responder/kcm/kcmsrv_ops.c
@@ -31,7 +31,7 @@
 #include "responder/kcm/kcmsrv_ops.h"
 #include "responder/kcm/kcmsrv_ccache.h"

-#define KCM_REPLY_MAX   2048
+#define KCM_REPLY_MAX 16384

 struct kcm_op_ctx {
     struct kcm_resp_ctx *kcm_data;
diff --git a/src/util/tev_curl.c b/src/util/tev_curl.c
index 8faf07c71..52c86adde 100644
--- a/src/util/tev_curl.c
+++ b/src/util/tev_curl.c
@@ -35,7 +35,7 @@
 #include "util/tev_curl.h"

 #define TCURL_IOBUF_CHUNK   1024
-#define TCURL_IOBUF_MAX     4096
+#define TCURL_IOBUF_MAX    16384

 static bool global_is_curl_initialized;

The max buf should be much larger, a single ticket can be 65K in size, and you need to take in account encoding and such.
I would use something that is at least triple that size because some tickets may contain even more AD data technically.

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.15.3

2 years ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1448094

2 years ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1448094

2 years ago

Metadata Update from @jhrozek:
- Issue priority set to: critical

2 years ago

Metadata Update from @jhrozek:
- Issue assigned to jhrozek

2 years ago

The first patchset is part of https://github.com/SSSD/sssd/pull/225

But since the patches still need to be reviewed and RHEL is no longer blocked waiting for this bug, I think it's OK if upstream moves the ticket to 1.15.4

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.15.4 (was: SSSD 1.15.3)

2 years ago

The proper fix will really come in 1.15.4

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.16.1 (was: SSSD 1.15.4)

2 years ago

I just found that this is trivial to hit in F27. The machine stays unusable for the user who overflows their ccache until I wipe the ccache manually (with curl -XDELETE) or reboot, it seems.

I tend to have a whole lot of service principals active at one time. Over 200 is not abnormal. This is doable using the kernel keyring (though a bug in keyutils will occasionally cause ssh segfaults) but of course that just leads to aggressive recycling of entries in the keyring.

It appears that the kernel defaults will allow me to store about 80 keys. I would suggest that the KCM by default at least have a similar limit so there's no functionality loss.

Edit: I rebuilt sssd with KCM_REPLY_MAX and TCURL_IOBUF_MAX raised to 2**18. The behavior is changed, but not really much better. Instead of klist: Credentials cache I/O operation failed I get klist: No credentials cache found and by poking at sssd_secrets with curl I can see that the credentials cache was indeed completely destroyed.

Edit 2: Adding a bunch of keys to my ccache slowly shows that after I get to some small number over 60 keys, the ccache is simply destroyed. It doesn't seem to matter how long I wait in between acquiring keys. The count is so far either 61 or 62; I guess I'm hitting a limit somewhere but I don't know what it is. The json data retrieved with curl just before this happened is 48544 bytes in size, so it's not hitting an obvious power of two limit and certainly not hitting the 256K limit I patched in.

Metadata Update from @lslebodn:
- Issue tagged with: KCM

2 years ago

Metadata Update from @jhrozek:
- Issue tagged with: postpone-to-1-16-2

2 years ago

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.16.2 (was: SSSD 1.16.1)

a year ago

Metadata Update from @jhrozek:
- Issue untagged with: postpone-to-1-16-2

a year ago

Some more patches that improve the situation:
b09cd30
786c400
2f11cf2
bfc6d9d

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 2.0 (was: SSSD 1.16.2)

a year ago

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 2.1 (was: SSSD 2.0)

10 months ago

Metadata Update from @jhrozek:
- Issue tagged with: PR

6 months ago

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 2.2 (was: SSSD 2.1)

4 months ago

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 2.3 (was: SSSD 2.2)

12 days ago

Login to comment on this ticket.

Metadata