#233 implement --make-fatal and return non-zero exit code if task fails
Closed: Fixed None Opened 8 years ago by kparal.

This has been discussed here:
https://lists.fedoraproject.org/pipermail/qa-devel/2015-March/001171.html

It is handy sometimes if runtask can return an exit code based on task outcome (0 for PASSED or INFO, non-zero otherwise). Let's implement it as a cmdline option, something like --make-fatal (or think of a better name).

When this option is specified, return an exit code based on the last TAP entry - this means that for multi-item checks the author will need to add one new "overall" TAP entry which will then be returned. For failed outcomes, we want the return code to be a particular number, e.g. 100, so that we can still distinguish failed checks from other errors (e.g. python exceptions). Think about which exit code number to use for what purpose.

When this option is not specified, we will continue using our default approach - exit code 0 if the task completed (regardless of outcome), non-zero for execution failures (exceptions, crashes).


This ticket had assigned some Differential requests:
D442
D447

It seems to me that D442 uncovered a flaw in our design - we don't have a good way to find out what the task result is, and relying on the export to be TAP and the intended TAP is quite unclean and error-prone. Exported variables are used for communication between directives in general, not just for TAP, and several different TAPs can be present and used inside the formula (see depcheck). I'd like to avoid doing best guesses here. Here are some ideas how to do it differently. All of them are explicit, so the --fatal argument would not be needed (we might still require it to confirm this behavior, depending on which behavior we need as default in our infrastructure).

== exitcode directive ==

One alternative I was able to come up with is a directive for setting the exit code. Something like this:

- name: set runtask exit code
  exitcode:
      tap_worst: ${tap}

and

- name: set runtask exit code
  exitcode:
      tap_last: ${tap}

This could also be handy:

- name: set runtask exit code
  exitcode:
      code: ${number}

We would have to think whether we allow arbitrary codes to be used (it could override some of our return codes), or convert it to just 0/100, or maybe allow everything in the range of {0,100-255}.

Running this directive multiple times would override the previous result only if the current result was worse (i.e. this would allow it to be used even for tasks like depcheck with multiple TAPs).

In runner.py:LocalRunner.do_single_action(), we would check which directive is executed, and if it was exitcode, we would store the exitcode returned, and exit with it after the execution was complete.

== honor_exitcode for python directive ==

A simpler alternative, yet not so generic, would be to add something like honor_exitcode for the existing python directive:

    - name: run upgradepath
      python:
          file: upgradepath.py
          callable: main
          custom_args: [--debug, --storedir, "${artifactsdir}" , "${koji_tag}"]
          honor_exitcode: True
      export: upgradepath_output

This way, it would be up to the python script/wrapper to return an appropriate exit code, and we would save it again in LocalRunner.

== exitcode for resultsdb directive ==

And another simple alternative is to implement this for resultsdb directive, like this:

    - name: report results to resultsdb
      resultsdb:
          results: ${upgradepath_output}
          set_exitcode: last_result
      export: upgradepath_output

where set_exitcode would be last_result or worst_result.

All of these have some advantages and disadvantages. Thoughts?

Out of the three options, I like the first one (exitcode directive) the best - the other two are IMHO situational - we do use python directive for most (all?) our tasks at the moment, but that won't probably be true forever. Using resultsdb directive as a "side-effect producer" is IMHO the least obvious, and you might not want to report to resultsdb, but use the exitcode at the same time. Yes, you can disable the reporting in config, but then you effectively can't decide on formula-per-formula basis.

Not disputing, just adding some remarks:

we do use python directive for most (all?) our tasks at the moment, but that won't probably be true forever

My thought was that we would add the same option for all script-execution directives.

Using resultsdb directive as a "side-effect producer" is IMHO the least obvious, and you might not want to report to resultsdb, but use the exitcode at the same time. Yes, you can disable the reporting in config, but then you effectively can't decide on formula-per-formula basis.

That's true. But we already bundle some additional functionality with resultsdb directive, like checking that TAP is OK and logging the details. Maybe we could decouple that from resultsdb, and join it with the possibility to set exit code. So we would have check_tap and resultsdb directives. Just a wild thought.

Out of these three, I must say I don't have a strong preference, I don't see any of those as being clearly cleaner/simpler/superior to the others (maybe someone can come up with something even better?). If we decide to go with the exitcode directive, I'd implement just the simplest variant with tap_worst and tap_last. I wouldn't implement code: ${number} until we really need it, it seems to add unnecessary complexity (my original thought was that we could forward a python directive exit code to it, but we have no way of doing it at the moment anyway).

Login to comment on this ticket.

Metadata