PR#405: exit with non-zero code when some playbook crashes - taskotron/libtaskotron

taskotron / libtaskotron

#405 exit with non-zero code when some playbook crashes

Closed 6 years ago by kparal. Opened 6 years ago by kparal.

feature/error-exit into develop

exit with non-zero code when some playbook crashes

Kamil Páral • 6 years ago

d253ce6

docs/source/resultyaml.rst

file modified

+24 -14

		`@@ -11,16 +11,8 @@`
		`to the specification described below.`


		`- YAML and reporting`
		`- ==================`
		`-`
		`- First up, we will provide you with two examples of the valid YAML output - the`
		`- minimal version, which just covers requirements, and then the extended version,`
		`- which uses the format to its full potential Taskotron-vise.`
		`-`
		`-`
		`Minimal version`
		`- ---------------`
		`+ ===============`

		`If you don't need anything fancy, then this is the minimalistic version of the`
		`output which, at the same time, is valid for results reporting in Taskotron::`
		`@@ -64,12 +56,11 @@`
		`.. Note::`

		``item``, ``type``, ``outcome`` and ``checkname`` are just some of the
		`- "reserved" key-names in the YAML block. Refer to the next section for the`
		`- full list.`
		+ "reserved" key-names in the YAML block. See :ref:`reserved_fields`.


		`Full version`
		`- ------------`
		`+ ============`

		`Although the minimal version of the output is sufficient, you can use`
		`additional fields to provide more information::`
		`@@ -112,8 +103,10 @@`
		`results as such.`


		`- Note on the reserved field names`
		`- --------------------------------`
		`+ .. _reserved_fields:`
		`+`
		`+ Note on reserved field names`
		`+ ----------------------------`

		`The full list of fields with a pre-defined meaning is available at`
		:py:attr:`libtaskotron.check.RESERVED_KEYS`.
		`@@ -122,3 +115,20 @@`
		``package`` in the above example), which will get stored in ResultsDB, and can
		`be then used for searching. Those fields will be specific for your task and you`
		`can give them any meaning and use them for any purpose that you need.`
		`+`
		`+`
		`+ Special cases`
		`+ =============`
		`+`
		`+ Empty results`
		`+ -------------`
		`+`
		`+ There might be a case when you need to specify that your task finished fine,`
		`+ but there are no results that should be posted. A ResultYAML file that contains`
		+ no results is a dictionary with ``results`` key and either no value::
		`+`
		`+ results:`
		`+`
		`+ or an empty list value::`
		`+`
		`+ results: []`

docs/source/standard-test-interface.rst

file modified

+5 -4

		`@@ -34,7 +34,7 @@`
		`.. note:: Taskotron also includes support for plain STI tests (i.e. specific,`
		`not generic tasks). However, this support is experimental and not maintained`
		`at the moment. There are other systems which handle this area, at least in`
		`- Fedora.`
		`+ Fedora. See CI_.`


		`.. _taskotron-sti:`
		`@@ -56,11 +56,12 @@`
		`STI.`

		`* The playbook's exit code is not used for generating PASS/FAIL result`
		`- automatically (and in our case, submitting the result to ResultsDB_).`
		`- Instead, the task is expected to create`
		- ``{{artifacts}}/taskotron/results.yml`` file in :doc:`resultyaml` by itself.
		`+ automatically (and in our case, for submitting the result to ResultsDB_).`
		+ Instead, the task must create ``{{artifacts}}/taskotron/results.yml`` file in
		+ :doc:`resultyaml` on its own.

		Read :doc:`writingtasks` to see some examples of the format described here.


		`.. _ResultsDB: https://fedoraproject.org/wiki/ResultsDB`
		`+ .. _CI: https://fedoraproject.org/wiki/CI`

libtaskotron/directives/resultsdb_directive.py

file modified

+1 -2

		`@@ -14,7 +14,6 @@`
		`from libtaskotron import config`
		`from libtaskotron.exceptions import TaskotronDirectiveError, TaskotronValueError`
		`from libtaskotron.logger import log`
		`- from libtaskotron.ext.fedora import rpm_utils`
		`import resultsdb_api`

		`DOCUMENTATION = """`
		`@@ -256,7 +255,7 @@`
		`check_details = check.import_YAML(params['results'])`
		`log.debug("YAML output parsed OK.")`
		`except (TaskotronValueError, IOError) as e:`
		`- raise TaskotronDirectiveError("Failed to load results: %s" % e.message)`
		`+ raise TaskotronDirectiveError("Failed to load results: %s" % e)`

		`for detail in check_details:`
		`if not (detail.item and detail.report_type and detail.checkname):`

libtaskotron/executor.py

file modified

+38 -23

		`@@ -145,9 +145,11 @@`
		`directory`
		`:param str ipaddr: IP address of the machine the task will be run on`
		:param bool root: whether to run as ``root`` for local execution mode
		`- :return: stream output of the ansible-playbook command (stdout and`
		`- stderr merged together)`
		`- :rtype: str`
		+ :return: a tuple of ``(str, bool)``. The first item is stream output of
		`+ the ansible-playbook command (stdout and stderr merged together).`
		`+ The second item marks whether this playbook is a Taskotron generic`
		+ task (``True``) or a plain STI task (``False``).
		`+ :rtype: tuple`
		`:raise TaskotronPlaybookError: when the playbook is not syntactically`
		`correct`
		`'''`
		`@@ -181,9 +183,11 @@`
		`# not just in the first?`
		`playbook = yaml.load(playbook_file.read())[0]`
		`if playbook.get('vars',{}).get('taskotron_generic_task', False):`
		`+ generic = True`
		`exec_tasks = 'tasks_generic.yml'`
		`log.debug('Playbook %s is a generic task', test_playbook)`
		`else:`
		`+ generic = False`
		`exec_tasks = 'tasks_sti.yml'`
		`log.debug('Playbook %s is an STI task', test_playbook)`
		`keepalive = playbook.get('vars', {}).get(`
		`@@ -267,7 +271,7 @@`
		`signal.signal(signal.SIGTERM, self._interrupt_handler)`

		`output, _ = os_utils.popen_rt(cmd, cwd=ansible_dir)`
		`- return output`
		`+ return (output, generic)`
		`except subprocess.CalledProcessError as e:`
		`log.error('ansible-playbook ended with %d return code',`
		`e.returncode)`
		`@@ -302,22 +306,25 @@`
		`raise exc.TaskotronInterruptError(signum, signame)`

		`def _report_results(self, test_playbook):`
		+ '''Report results from playbook's ``results.yml`` (stored in
		`+ artifactsdir) into ResultsDB.`
		`+`
		`+ :param str test_playbook: base name of the playbook file`
		`+ :raise TaskotronDirectiveError: when there's a problem processing`
		+ ``results.yml`` file
		`+ '''`
		`results_file = os.path.join(self.arg_data['artifactsdir'],`
		`test_playbook, 'taskotron', 'results.yml')`
		`- if os.path.exists(results_file):`
		`- log.info('Reporting results from: %s', results_file)`
		`- rdb = resultsdb_directive.ResultsdbDirective()`
		`- rdb.process(params={"file": results_file}, arg_data=self.arg_data)`
		`- else:`
		`- #FIXME change to exception?`
		`- log.warn("Results file %s doesn't exist" % results_file)`
		`+ log.info('Reporting results from: %s', results_file)`
		`+ rdb = resultsdb_directive.ResultsdbDirective()`
		`+ rdb.process(params={"file": results_file}, arg_data=self.arg_data)`

		`def execute(self):`
		`'''Execute all the tasks in the taskdir`

		- :return: ``True`` if execution finished (that doesn't mean all
		- playbooks finished successfully). ``False`` if the execution
		`- was interrupted (e.g. a system signal).`
		+ :return: ``True`` if execution finished successfully for all playbooks
		+ present. ``False`` if some of them crashed, haven't produced any
		`+ results or the execution was interrupted (e.g. a system signal).`
		`:rtype: bool`
		`'''`
		`test_playbooks = fnmatch.filter(os.listdir(self.arg_data['taskdir']),`
		`@@ -332,24 +339,32 @@`

		`log.info('Running task on machine: %s', ipaddr)`

		`- interrupted = False`
		`+ failed = []`
		`for test_playbook in test_playbooks:`
		`- log.info('Running testsuite: %s' % test_playbook)`
		`+ log.info('Running playbook: %s' % test_playbook)`
		`try:`
		`- self._run_playbook(test_playbook, ipaddr)`
		`- self._report_results(test_playbook)`
		`+ _, generic = self._run_playbook(test_playbook, ipaddr)`
		`+ if generic:`
		`+ self._report_results(test_playbook)`
		`except exc.TaskotronInterruptError as e:`
		`log.error('Caught system interrupt during execution of '`
		`- 'testuite %s: %s. Not executing any other playbooks.',`
		`+ 'playbook %s: %s. Not executing any other playbooks.',`
		`test_playbook, e)`
		`- interrupted = True`
		`+ failed.append(test_playbook)`
		`break`
		`except exc.TaskotronError as e:`
		`- log.error('Error during execution of testsuite %s: %s',`
		`+ log.error('Error during execution of playbook %s: %s',`
		`test_playbook, e)`
		`+ failed.append(test_playbook)`
		`finally:`
		`if self.task_vm is not None:`
		`self.task_vm.teardown()`
		`- log.info('Testsuite execution finished: %s', test_playbook)`
		`+ log.info('Playbook execution finished: %s', test_playbook)`
		`+`
		`+ if failed:`
		`+ log.error('Some playbooks failed during execution: %s',`
		`+ ', '.join(failed))`
		`+ else:`
		`+ log.info('All playbooks finished successfully')`

		`- return not interrupted`
		`+ return not failed`

libtaskotron/main.py

file modified

+5 -4

		`@@ -197,11 +197,12 @@`

		`# start execution`
		`executor = Executor(arg_data)`
		`- finished = executor.execute()`
		`+ success = executor.execute()`

		`# finalize`
		`- log.info('Execution %s at: %s. Task artifacts were saved in: %s',`
		`- 'finished' if finished else 'interrupted',`
		`+ log.info('Execution finished at: %s. Task artifacts were saved in: %s',`
		`datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC'),`
		`arg_data['artifactsdir'])`
		`- sys.exit(0 if finished else 1)`
		`+ if not success:`
		`+ log.error('Some playbooks failed. Exiting with non-zero exit code.')`
		`+ sys.exit(0 if success else 1)`

testing/functest_executor.py

file modified

+2 -2

		`@@ -70,8 +70,8 @@`

		`# the tradeoff here is that we need to run with root=False`
		`# (non-default)`
		`- output = self.executor._run_playbook(self.playbook_name, self.ipaddr,`
		`- root=False)`
		`+ output, _ = self.executor._run_playbook(self.playbook_name,`
		`+ self.ipaddr, root=False)`
		`# FIXME: We currently have no idea whether the test playbook passed`
		`# or failed, because we ignore the inner playbook's exit status. So`
		`# we can't really verify here whether everything worked ok.`

testing/functest_resultsdb_directive.py

file modified

+11

		`@@ -11,6 +11,7 @@`

		`from libtaskotron import check`
		`from libtaskotron import config`
		`+ import libtaskotron.exceptions as exc`


		`class StubConfigParser(object):`
		`@@ -121,3 +122,13 @@`

		`for key in self.cd.keyvals.keys():`
		`assert call_data[key] == self.cd.keyvals[key]`
		`+`
		`+ def test_file_missing(self, tmpdir):`
		`+ del self.ref_input['results']`
		`+ input_file = tmpdir.join('results.yaml')`
		`+ self.ref_input['file'] = input_file.strpath`
		`+`
		`+ with pytest.raises(exc.TaskotronDirectiveError):`
		`+ self.test_rdb.process(self.ref_input, self.ref_arg_data)`
		`+`
		`+ assert len(self.stub_rdb.calls()) == 0`

testing/test_executor.py

file modified

+79 -26

		`@@ -28,7 +28,15 @@`
		`taskotron_generic_task: true`
		`tasks:`
		`- debug:`
		`- msg: This is a sample debug printout`
		`+ msg: This is a sample debug printout from a Taskotron generic task`
		`+ '''`
		`+ PLAYBOOK_STI='''`
		`+ - hosts: localhost`
		`+ # this saves a lot of time when running in mock (without network)`
		`+ gather_facts: no`
		`+ tasks:`
		`+ - debug:`
		`+ msg: This is a sample debug printout from an STI task`
		`'''`


		`@@ -104,13 +112,9 @@`
		`self.arg_data`

		`def test_report_results_missing_file(self, monkeypatch):`
		`- '''Don't try to report results when no results file exists'''`
		`- mock_rdb = mock.Mock()`
		`- monkeypatch.setattr(resultsdb_directive, 'ResultsdbDirective',`
		`- mock_rdb)`
		`-`
		`- self.executor._report_results('missing-playbook-dir')`
		`- mock_rdb.assert_not_called()`
		`+ '''Raise when no results file exists'''`
		`+ with pytest.raises(exc.TaskotronDirectiveError):`
		`+ self.executor._report_results('missing-playbook-dir')`

		`def test_run_playbook(self, monkeypatch):`
		`'''A standard invocation of ansible-playbook'''`
		`@@ -122,7 +126,8 @@`
		`mock_popen = mock.Mock(return_value=('fake output', None))`
		`monkeypatch.setattr(os_utils, 'popen_rt', mock_popen)`

		`- output = self.executor._run_playbook(self.playbook_name, self.ipaddr)`
		`+ output, _ = self.executor._run_playbook(self.playbook_name,`
		`+ self.ipaddr)`

		`# must return playbook output`
		`assert output == 'fake output'`
		`@@ -214,7 +219,7 @@`
		`'''Execution using local mode'''`
		`mock_spawn_vm = mock.Mock()`
		`monkeypatch.setattr(executor.Executor, '_spawn_vm', mock_spawn_vm)`
		`- mock_run_playbook = mock.Mock()`
		`+ mock_run_playbook = mock.Mock(return_value=(None, True))`
		`monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`mock_run_playbook)`
		`mock_report_results = mock.Mock()`
		`@@ -222,9 +227,9 @@`
		`mock_report_results)`
		`assert self.executor.arg_data['local'] == True`

		`- finished = self.executor.execute()`
		`+ success = self.executor.execute()`

		`- assert finished == True`
		`+ assert success == True`
		`mock_spawn_vm.assert_not_called()`
		`mock_run_playbook.assert_called_once()`
		`assert mock_run_playbook.call_args[0][1] == '127.0.0.1'`
		`@@ -234,7 +239,7 @@`
		`'''Execution using ssh mode'''`
		`mock_spawn_vm = mock.Mock()`
		`monkeypatch.setattr(executor.Executor, '_spawn_vm', mock_spawn_vm)`
		`- mock_run_playbook = mock.Mock()`
		`+ mock_run_playbook = mock.Mock(return_value=(None, True))`
		`monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`mock_run_playbook)`
		`mock_report_results = mock.Mock()`
		`@@ -244,9 +249,9 @@`
		`self.executor.arg_data['ssh'] = True`
		`self.executor.arg_data['machine'] = '127.0.0.2'`

		`- finished = self.executor.execute()`
		`+ success = self.executor.execute()`

		`- assert finished == True`
		`+ assert success == True`
		`mock_spawn_vm.assert_not_called()`
		`mock_run_playbook.assert_called_once()`
		`assert mock_run_playbook.call_args[0][1] == '127.0.0.2'`
		`@@ -256,7 +261,7 @@`
		`'''Execution using libvirt mode'''`
		`mock_spawn_vm = mock.Mock(return_value='127.0.0.3')`
		`monkeypatch.setattr(executor.Executor, '_spawn_vm', mock_spawn_vm)`
		`- mock_run_playbook = mock.Mock()`
		`+ mock_run_playbook = mock.Mock(return_value=(None, True))`
		`monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`mock_run_playbook)`
		`mock_report_results = mock.Mock()`
		`@@ -267,9 +272,9 @@`
		`self.executor.arg_data['local'] = False`
		`self.executor.arg_data['libvirt'] = True`

		`- finished = self.executor.execute()`
		`+ success = self.executor.execute()`

		`- assert finished == True`
		`+ assert success == True`
		`mock_spawn_vm.assert_called_once()`
		`mock_run_playbook.assert_called_once()`
		`assert mock_run_playbook.call_args[0][1] == '127.0.0.3'`
		`@@ -294,7 +299,7 @@`

		`def test_execute_more_playbooks(self, monkeypatch):`
		`'''Should execute all found playbooks'''`
		`- mock_run_playbook = mock.Mock()`
		`+ mock_run_playbook = mock.Mock(return_value=(None, True))`
		`monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`mock_run_playbook)`
		`mock_report_results = mock.Mock()`
		`@@ -302,9 +307,9 @@`
		`mock_report_results)`
		`self.playbook.copy(self.taskdir.join('tests_copy.yml'))`

		`- finished = self.executor.execute()`
		`+ success = self.executor.execute()`

		`- assert finished == True`
		`+ assert success == True`
		`assert mock_run_playbook.call_count == 2`
		`playbooks = [`
		`mock_run_playbook.call_args_list[0][0][0],`
		`@@ -321,7 +326,7 @@`
		`assert 'tests_copy.yml' in playbooks`

		`def test_execute_error(self, monkeypatch):`
		`- '''Should ignore playbook errors'''`
		`+ '''Should raise on playbook errors'''`
		`mock_run_playbook = mock.Mock(side_effect=exc.TaskotronError)`
		`monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`mock_run_playbook)`
		`@@ -329,10 +334,58 @@`
		`monkeypatch.setattr(executor.Executor, '_report_results',`
		`mock_report_results)`

		`- finished = self.executor.execute()`
		`+ success = self.executor.execute()`
		`+`
		`+ assert success == False`
		`+ mock_run_playbook.assert_called_once()`
		`+ mock_report_results.assert_not_called()`
		`+`
		`+ def test_execute_more_playbooks_error(self, monkeypatch):`
		`+ '''Should execute all found playbooks even if some has error'''`
		`+ mock_run_playbook = mock.Mock(side_effect=[`
		`+ exc.TaskotronError,`
		`+ (None, True)])`
		`+ monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`+ mock_run_playbook)`
		`+ mock_report_results = mock.Mock()`
		`+ monkeypatch.setattr(executor.Executor, '_report_results',`
		`+ mock_report_results)`
		`+ self.playbook.copy(self.taskdir.join('tests_copy.yml'))`
		`+`
		`+ success = self.executor.execute()`
		`+`
		`+ assert success == False`
		`+ assert mock_run_playbook.call_count == 2`
		`+ playbooks = [`
		`+ mock_run_playbook.call_args_list[0][0][0],`
		`+ mock_run_playbook.call_args_list[1][0][0]`
		`+ ]`
		`+ assert 'tests.yml' in playbooks`
		`+ assert 'tests_copy.yml' in playbooks`
		`+ assert mock_report_results.call_count == 1`
		`+ playbooks = [`
		`+ mock_report_results.call_args_list[0][0][0]`
		`+ ]`
		`+ assert 'tests.yml' in playbooks or 'tests_copy.yml' in playbooks`

		`- assert finished == True`
		`+ def test_execute_sti_no_report(self, monkeypatch):`
		`+ '''Shouldn't try to report results for STI tasks'''`
		`+ self.playbook.write(PLAYBOOK_STI)`
		`+ mock_spawn_vm = mock.Mock()`
		`+ monkeypatch.setattr(executor.Executor, '_spawn_vm', mock_spawn_vm)`
		`+ mock_run_playbook = mock.Mock(return_value=(None, False))`
		`+ monkeypatch.setattr(executor.Executor, '_run_playbook',`
		`+ mock_run_playbook)`
		`+ mock_report_results = mock.Mock()`
		`+ monkeypatch.setattr(executor.Executor, '_report_results',`
		`+ mock_report_results)`
		`+`
		`+ success = self.executor.execute()`
		`+`
		`+ assert success == True`
		`+ mock_spawn_vm.assert_not_called()`
		`mock_run_playbook.assert_called_once()`
		`+ assert mock_run_playbook.call_args[0][1] == '127.0.0.1'`
		`mock_report_results.assert_not_called()`

		`def test_execute_interrupted(self, monkeypatch):`
		`@@ -346,9 +399,9 @@`
		`mock_report_results)`
		`self.playbook.copy(self.taskdir.join('tests_copy.yml'))`

		`- finished = self.executor.execute()`
		`+ success = self.executor.execute()`

		`- assert finished == False`
		`+ assert success == False`
		`mock_run_playbook.assert_called_once()`
		`mock_report_results.assert_not_called()`

testing/test_resultsdb_directive.py

file modified

+17

		`@@ -184,6 +184,23 @@`
		`for key in self.cd.keyvals.keys():`
		`assert call_data[key] == self.cd.keyvals[key]`

		`+ def test_empty_results(self):`
		`+ """When there are no results, it shouldn't crash, just do nothing"""`
		`+ yaml1 = 'results:'`
		`+ yaml2 = 'results: []'`
		`+`
		`+ for ref_yaml in [yaml1, yaml2]:`
		`+ self.test_rdb.process({'results': ref_yaml}, self.ref_arg_data)`
		`+ assert len(self.stub_rdb.calls()) == 0`
		`+`
		`+ def test_empty_file(self):`
		`+ """Should raise for a completely empty file"""`
		`+`
		`+ with pytest.raises(TaskotronDirectiveError):`
		`+ self.test_rdb.process({'results': ''}, self.ref_arg_data)`
		`+`
		`+ assert len(self.stub_rdb.calls()) == 0`
		`+`
		`def test_both_file_and_results(self, monkeypatch):`
		`ref_input = {'results': 'foobar', 'file': 'foo.bar'}`
		`with pytest.raises(TaskotronDirectiveError):`

kparal commented 6 years ago

We need to detect when a task hasn't finished properly. It's impossible
to do for STI, but for Taskotron generic tasks, we can check whether
results.yml exists. If it doesn't, the task clearly hasn't finished
successfuly. In that case, exit with non-zero exit code, so that our
tools (Buildbot) can inform us about a crashing task.

This code was written after all executions of task-rpmdeplint were
silently crashing every single time for a longer period of time. This
should help us detect such cases.

This will also detect cases when our playbooks are not working properly,
or when task playbooks are not syntactically correct.

Documentation was adjusted to explain that results.yml file must be
created, and how to create that file if you want no results to be
posted.

Metadata Update from @kparal:
- Request assigned

6 years ago