From 42f7f8f2d924eb8abe52b1c118ee89871d9112f1 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Mar 10 2020 17:27:50 +0000 Subject: sanlock: fix closing wrong client fd The symptoms of this bug were inq_lockspace returning ECONNRESET. It was caused by a previous client closing the fd of a newer client doing inq_lockspace (when both clients were running at roughly the same time.) First client ci1, second client ci2. ci1 in call_cmd_daemon() is finished, and close(fd) is called (and client[ci].fd is *not* set to -1). ci2 is a new client at about the same time and gets the same fd that had been used by ci1. ci1 being finished triggers a poll error, which results in client_free(ci1). client_free looks at client[ci1].fd and finds it is not -1, so it calls close() on it, but this fd is now being used by ci2. This breaks the sanlock daemon connection for ci2 and the client gets ECONNRESET. --- diff --git a/src/cmd.c b/src/cmd.c index d816fd7..b79e224 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -3072,7 +3072,24 @@ void call_cmd_daemon(int ci, struct sm_header *h_recv, int client_maxi) break; }; + /* + * Previously just called close(fd) and did not set client[ci].fd = -1. + * This meant that a new client ci could get this fd and use it. + * + * When a poll error occurs because this ci was finished, then + * client_free(ci) would be called for this ci. client_free would + * see cl->fd was still set and call close() on it, even though that + * fd was now in use by another ci. + * + * We could probably get by with just doing this here: + * client[ci].fd = -1; + * close(fd); + * + * and then handling the full client_free in response to + * the poll error (as done previously), but I see no reason + * to avoid the full client_free here. + */ if (auto_close) - close(fd); + client_free(ci); }