PR#43: [backend] handle paramiko.SSHException in _run_ssh_cmd - copr/copr

copr / copr

#43 [backend] handle paramiko.SSHException in _run_ssh_cmd

Merged 7 years ago by clime. Opened 7 years ago by praiskup.

Unknown source paramiko-exception into master

[backend] handle paramiko.SSHException in _run_ssh_cmd

Pavel Raiskup • 7 years ago

d5b4217

backend/backend/mockremote/builder.py

file modified

+6 -1

		`@@ -109,7 +109,12 @@`

		`self.log.info("BUILDER CMD: "+cmd)`

		`- stdin, stdout, stderr = conn.exec_command(cmd)`
		`+ try:`
		`+ stdin, stdout, stderr = conn.exec_command(cmd)`
		`+ except paramiko.SSHException as err:`
		`+ raise RemoteCmdError("Paramiko failure.",`
		`+ cmd, -1, as_root, str(err), "(none)")`
		`+`
		`rc = stdout.channel.recv_exit_status() # blocks`
		`out, err = stdout.read(), stderr.read()`

praiskup commented 7 years ago

Warning: Untested, I'm not running from the latest-greatest copr
release (I'm still using obsoleted ansible's python API).

Sometimes even when the VM allocation succeeds (spawn_playbook),
it might happen that ssh fails to respond a few moments later.

Not handling this properly means that the exception even leaks out
from do_job() call -> which means that (a) frontend is not
informed about build failure and (b) the worker/builder might not
be deallocated.

I observed similar issues when we were using ansible python API;
previously we did:

self.run_ansible_with_check("/bin/rpm -q mock rsync")

.. with unhandled VmError from check_for_ans_error(). This was
replaced by:

self._run_ssh_cmd("/bin/rpm -q mock rsync")

Without handling paramiko.SSHException however -- so I suppose the
problem is still there and thus fixing it.

praiskup commented 7 years ago

@msuchy, fyi -> this might be the issue we chatted about in kitchen :)

clime commented 7 years ago

I agree with handling this but note that it's quite strange for this error to occur because banner_timeout is None by default when creating paramiko connection in builder.py using paramiko.SSHClient. That means it should really wait for the remote sshd to start communicating. Anyway, logging this might at least give us some debugging info.

praiskup commented 7 years ago

Ah, yes ... It looks like this is not yet complete and I should edit the patch so also ssh exceptions during _create_ssh_conn() are properly handled.

It looks like with Ansible, the code always failed with the first command over ssh ... not during "connection creation"; but it might be that ansible implemented some lazy algorithm.

msuchy commented 7 years ago

clime commented 7 years ago

Have you actually tested that the exception raised really is paramiko.SSHException? There are other exceptions raised by http://docs.paramiko.org/en/2.1/api/client.html#paramiko.client.SSHClient.connect.

praiskup commented 7 years ago

Yeah, you are right - the connect() thus _create_ssh_conn() method is not handled by this patch at all (that's what I talked about in previous message). I'll have a look at the patch asap ...

But yes, I was able to reproduce the paramiko.SSHException exception with (pseudocode) this sequence:

connect()
sleep(10) # kill the server
run_command()

Edited 7 years ago by praiskup