#43 [backend] handle paramiko.SSHException in _run_ssh_cmd
Merged 7 years ago by clime. Opened 7 years ago by praiskup.
Unknown source paramiko-exception  into  master

@@ -109,7 +109,12 @@

  

          self.log.info("BUILDER CMD: "+cmd)

  

-         stdin, stdout, stderr = conn.exec_command(cmd)

+         try:

+             stdin, stdout, stderr = conn.exec_command(cmd)

+         except paramiko.SSHException as err:

+             raise RemoteCmdError("Paramiko failure.",

+                                  cmd, -1, as_root, str(err), "(none)")

+ 

          rc = stdout.channel.recv_exit_status() # blocks

          out, err = stdout.read(), stderr.read()

  

Warning: Untested, I'm not running from the latest-greatest copr
release (I'm still using obsoleted ansible's python API).

Sometimes even when the VM allocation succeeds (spawn_playbook),
it might happen that ssh fails to respond a few moments later.

Not handling this properly means that the exception even leaks out
from do_job() call -> which means that (a) frontend is not
informed about build failure and (b) the worker/builder might not
be deallocated.

I observed similar issues when we were using ansible python API;
previously we did:

self.run_ansible_with_check("/bin/rpm -q mock rsync")

.. with unhandled VmError from check_for_ans_error(). This was
replaced by:

self._run_ssh_cmd("/bin/rpm -q mock rsync")

Without handling paramiko.SSHException however -- so I suppose the
problem is still there and thus fixing it.

@msuchy, fyi -> this might be the issue we chatted about in kitchen :)

I agree with handling this but note that it's quite strange for this error to occur because banner_timeout is None by default when creating paramiko connection in builder.py using paramiko.SSHClient. That means it should really wait for the remote sshd to start communicating. Anyway, logging this might at least give us some debugging info.

Ah, yes ... It looks like this is not yet complete and I should edit the patch so also ssh exceptions during _create_ssh_conn() are properly handled.

It looks like with Ansible, the code always failed with the first command over ssh ... not during "connection creation"; but it might be that ansible implemented some lazy algorithm.

Have you actually tested that the exception raised really is paramiko.SSHException? There are other exceptions raised by http://docs.paramiko.org/en/2.1/api/client.html#paramiko.client.SSHClient.connect.

Yeah, you are right - the connect() thus _create_ssh_conn() method is not handled by this patch at all (that's what I talked about in previous message). I'll have a look at the patch asap ...

But yes, I was able to reproduce the paramiko.SSHException exception with (pseudocode) this sequence:

connect()
sleep(10) # kill the server
run_command()

Pull-Request has been merged by clime

7 years ago
Metadata