#168 Improvement of result semantics and execution status handling in libtaskotron/resultsdb/execdb
Opened 8 years ago by jskladan. Modified 6 years ago

== Definitions ==

  • Taskotron Developer - person responsible for developing/maintaining taskotron/libtaskotron, and/or the whole deployed stack
  • Task Developer - person responsible for writing a task (formula, python wrapper, ...), that executes a check. Is interested in execution status, not the actual results of said check - i.e. cares that rpmlint is crashing, but not that rpmlint reports FAILED for [item under test]
  • Maintainer - person responsible for consuming results of Tasks/Tests - i.e. cares that rpmlint reports FAILED for [item under test]. Also any of the automated tools for gating, Bodhi, ...
  • Global Execution Status - what Taskotron Developer cares about - basically represented by the Buildbot steps, and what's in execdb at the moment. When something crashes on the "top level" - i.e. in the buildbot/taskotron code - Taskotron Developer wants to know. This also contains errors like network timeouts, and errors caused out of taskotron/libtaskotron's "core", but handled by taskotron/libtaskotron
  • Task Execution Status - what Task Developer (and to some extent possibly also Taskotron Developer) cares about - when something crashes/goes wrong in the Task's execution/code, Task Developer wants to know.
  • Task Outcome - what Maintainer cares about - when task fails for [item under test] fails, Maintained wants to know.

Multiple roles can, of course, be part of one real-life person, but that is not important for the overall concept.

== Task Outcome ==

While discussing this ticket (adding CRASHED as a test result) with @kparal, we came to realize, that although we do have the semantics for individual Outcome variants (currently 'PASSED', 'INFO', 'FAILED', 'NEEDS_INSPECTION', 'ABORTED', 'CRASHED'), it unnecessarily combines together the actual outcome, and execution status of the task.

After intense consideration, we think that Maintainers should do fine with these variants of Task Outcome:
* Pass - everything went OK, no need worry: PASSED
* Conditional pass - when some not-that-important parts of the test fail (e.g. a subcheck in rpmgrill), so the overall result is considered to be PASSED, but someone might want to have a look at that - INFO (considered as PASSED for Bodhi/gating/whatever)
* Fail - it all went sideways, we should not let this go through - FAILED
* Unknown - maybe test provided only partial results, or there is some special case in Depcheck, that can not be automatically decided between PASSED and FAILED, any time when real-life-person's intervention is needed in order to determine outcome, or any time when the maintainer should be notified, but neither of the previous variants really fit - NEEDS_INSPECTION (considered as "no result present" for Bodhi/gating/whatever tool)

Thus reducing the Outcome variants to PASSED, INFO, FAILED, NEEDS_INSPECTION.

== Execution Status ==

At the moment, we do not store any information about the Task Execution status. ResultsDB directive just marks all the Jobs as Completed, and we do not provide any way to change the Job's status. I'm also quite sure, that the Job (as implemented now) is unnecessary, as it was originally there to represent the Global Execution status, but never really got used to it. More on what to change in ResultsDB in the Proposed Changes section.

Global Execution status is represented by the Buildbot's steps. Specifically the runtask step, and can be seen (and searched for in the future) in ExecDB. On the other hand, it is not really granular at all now - all we can distinguish is OK or NOT OK, and we can not easily say, whether the crash was caused by Libtaskotron's code or the task's (userspace) code.

For the NOT OK we would want to recognize at least these:
* error in taskotron - exception in libtaskotron, disposable minion not started - most probably will not be "fixed" by a rerun
* error in resource - koji timed out, repo not synced, ... - fair chance that re-run will cause non-error result
* error in userspace/task - task's code throws errors, or fails to run

So the relevant party can be notified.

== Proposed changes ==

=== ResultsDB ===

  • Remove the unnecessary Outcome variants
  • Change the behaviour of Job->Result coupling - having a way to group multiple results is definitely something we want to keep, but the way Job->Result interaction is handled now is too tight. Job needs to be explicitly created, set to the right state, and so on. I'd keep the concept of it, but changed the behaviour. All in all, I'd like to make the Job/Grouping a little bit more "stealth" - available, but not enforced, and most importantly without the semantics of the Task Execution status.

=== ExecDB ===

  • Add a key, that holds the Global Execution Status. OK by default, and possible to set to a range of error codes - something in the likes of TASKOTRON_ERROR, RESOURCE_ERROR, TASK_ERROR for starters. Thus making it
  • Add search - by item, job name, step execution status, global execution status... So the Task Maintainers can check, whether the tasks are not crashing.
  • Add fedmsgs for the jobs that crashed - either on Global or Task level. Emitting notification about successfull runs does not IMHO have much sense, as long as we don't feedback-loop it to some other tool.

=== Libtaskotron ===

  • Add try/except block around 'userspace' directives like python directive - anywhere we execute the code provided by Task Developer - so we can gracefully handle errors, and report to execdb
  • Add execdb reporting/global status change to the relevant directives (koji directive, ...), where RESOURCE_ERROR can happen
  • Add exception, that can be used from inside the userspace/wrapper to notify the upper layers of libtaskotron about the crash in the code. This would be automagically handled by the error handler, but making it possible to just use explicitly is IMO good.

==== Docs ====

  • Make sure the semantics are well described.
  • Describe different situations, and what Outcomes make sense there. One of the good examples might be the Docker testsuite - when some of the tests crash, it might make sense to either set Outcome to NEEDS_INSPECTION, or just plain call the userspace-exception, and thus do not report anything to resultsdb.
  • Supported scenarios from the Task Developer's POW:
  • Task Execution ok; Results reported to ResultsDB; Task Execution Status set to OK
  • Task Execution fail; Not handled by wrapper; No results reported; Task Execution Status set to TASK_ERROR
  • Task Execution fail; Handled by wrapper;
    • Raise userspace exception; No results reported; Task Execution Status set to TASK_ERROR
    • Handle gracefully; Report NEEDS_INSPECTION or whatever else makes sense to ResultsDB; Task Execution Status set to OK

Yesterday we discussed with @jskladan an extension of this - allowing the task developers to set some flag //"alert me about this task run"//, for whatever reason (e.g. debugging random and infrequent issues, like the one we experienced in depcheck). Since the task developer is often not the same as the package maintainer (esp. for generic checks like rpmlint and depcheck), we don't want task developers to misuse task outcome for this - they should not set NEEDS_INSPECTION or INFO when something bad happens inside their task, as long as they still decide whether the overall outcome is pass or fail. Package maintainers will not want to debug the task, they will not understand it, we should not bother them with it. But still we want have an easy way for task developers to mark such executions as //"alert me about this"//.

There are multiple ways to how deal with this. We can recommend task devs to use a specific result keyval and then allow ResultsDB to find such results easily (a checkbox //"developer alerts"// in the search UI). Another approach is to use upcoming namespaces, and instead of using keyvals, simply report the result to a dev namespace (the namespace would mirror other namespaces like qa and pkg etc, just prefix everything with dev, so dev.qa, dev.pkg, etc). This can be done as a direct support in CheckDetail so that you just set CheckDetail.dev = True or similar, and such result will be exported to ResultsYAML as using the dev namespace. So if you wanted to report both to qa.depcheck and dev.qa.depcheck, you would clone the CheckDetail object and set the flag on one of them. You can do this just for selected CheckDetails, which is nice - your alert can be very targeted. This also nicely solves the use case when you can't decide whether the result is pass or fail at all. You simply don't report it into standard namespaces at all, just to dev namespace. The package maintainer won't receive any outcome, but the task developer will, and can start debugging the issue.

We could also use this approach for reporting task crashes (not caused by libtaskotron or transient network errors) to ResultsDB to dev namespace (only).

Login to comment on this ticket.

Metadata