#18 Clarify test result reporting in Standard Test Interface
Opened 5 years ago by psss. Modified 5 years ago

Currently the Standard Test Interface does not define how to distinguish test failure from an infrastructure error. When ansible playbook is run it usually does some environment preparation and only then executes tests. Upon a problem encountered during provisioning or configuration of the test environment ansible returns non-zero exit code but there is not an easy way how to detect such error from a successful test execution of test which failed (returns non-zero exit code as well).

Let's make this part of the specification more clear so that tools can detect infrastructure errors and report them as such to the end user. This will help to direct problems (errors versus failures) to the right audience (infrastructure staff versus developer/tester).

One of the possible solutions could be to use the test.log file to define test PASS/FAIL and use ansible return code to denote infrastructure problems:

ansible playbook failed ---> ERROR (infrastucture problem)
ansible playbook passed and:
    there is no test.log file ---> ERROR (no results, something went wrong)
    test.log contains a failed test ---> FAIL (test or tests failed)
    test.log contains a passed test ---> PASS (test or tests passed)

As far as test.log file is concerned, current specification defines it in a very general way:

 MUST treat the file test.log in the artifacts folder as the
 main readable output of the test.

We could define the format of the test.log file in more detail to make the result reporting easier. For example recommend the simple format RESULT test identifier currently implemented by Standard Test Roles, for example:

PASS shell/login
PASS shell/smoke
PASS shell/func

Another option could be introducing a new file for the result summary, for example test.results and keep test.log for free form test output. What are your thoughts on this?


@stefw, @pingou, could you please review?

Metadata Update from @psss:
- Issue assigned to psss

5 years ago

Very good move. Do you think TAP is a good candidate format for the test.log? rather than inventing yet another one?

https://testanything.org/

Another option could be introducing a new file for the result summary, for example test.results and keep test.log for free form test output. What are your thoughts on this?

Agree with this ^^

The original idea of the test.log file was to record any and all output from the tests so that the packager could quickly see what went wrong. Changing this would basically break the API we defined, so I'd push for the test.results file as proposed.

Re-using existing tools is definitely a bonus, though the work on TAP seems to be low at the moment, there is a branch in their specification repository for the version 14 of the specification, but there was no change to it in the last 4 years: https://github.com/TestAnything/Specification/tree/tap-14-specification :sob:
(Amusingly, there even have a ticket about return code: https://github.com/TestAnything/Specification/issues/22 )

We've used TAP in Taskotron for some time, and we abandoned it. IIRC it was way too simple. TAP13 allows you to include an arbitrary YAML inside, but we've had issues with TAP13 libraries at that time, and we figured why specify a custom YAML section (and possible write our own TAP13 library) when we can have the whole file as a YAML, therefore making it simpler. It's easy to produce by any library, easy to validate.

Our current Taskotron results file looks like this:

results:
  - item: xchat-0.5-1.fc20
    type: koji_build
    checkname: mytest
    outcome: PASSED

See the full documentation at:
https://qa.fedoraproject.org/docs/libtaskotron/latest/resultyaml.html

The important thing is that it allows us to define multiple results, possibly even with different item (in your terminology subject) types. This is not a problem for current Fedora CI, because you only execute package-specific tests (a single package -> a single test suite). If you intend to execute generic tasks in the future, you'll need to submit multiple results. Here's an example from rpmdeplint, generating package dependency results for each package in the testing repository:
https://taskotron.fedoraproject.org/artifacts/all/efeda40a-02db-11e9-b69f-525400fc9f92/tests.yml/taskotron/results.yml.gz
(currently all results are type: koji_build, but in the past we also had type: bodhi_update results mixed in, as an overall result for the whole Bodhi update - I'm explaining just to illustrate the concept)

In Taskotron, we consider any task execution that hasn't generated a results.yml file to be a crashed job. It could be possibly automatically re-executed (not implemented on our side). We can therefore easily detect finished jobs (even though the result is FAILED) from crashed jobs.

We can't yet detect jobs crashed due a test code error versus some system/infra error, even though we have some proposals in our archives that I could try to dig up if there's interest. But the very first step could be to detect errors that happen even before you run ansible-playbook tests.yml (before this point, it's definitely an test system failure), and for that you don't need any cooperation from test writers.

Also, here's an example of python-versions specifying an overall result (checkname: dist.python-versions) and subresults (checkname: dist.python-versions.*):
https://taskotron.fedoraproject.org/artifacts/all/6ddde2be-0258-11e9-b69f-525400fc9f92/tests.yml/taskotron/results.yml.gz
(note: unsorted)

This way the test case pinpoint where exactly the problem occurred. It can also decide whether the overall result is FAILED, or something else (it can be even PASSED, if the subresult is deemed non-important).

Thanks to all for your feedback and ideas. To summarize: We want to keep the test.log file as it is (used for arbitrary text output of the test) and introduce a new file which would contain a machine-readable result of the testing which could be used to clearly identify infrastructure errors.

From the feedback above TAP format does not seem to be a good candidate. Using a simple yaml format proposed by @kparal seems fine to me. Although we probably do not have to include all fields unless we have a clear use case for them. So what about calling the file results.yaml and including results as a list of dictionaries with the following meaning:

  • test ... unique name of the test
  • result ... test result (PASS or FAIL)
  • log ... link to a log file with details (optional)

Example output of the bash tests could look like this:

- test: func
  result: PASS
  log: /tmp/artifacts/PASS_shell-func.log
- test: login
  result: PASS
  log: /tmp/artifacts/PASS_shell-login.log
- test: smoke
  result: PASS
  log: /tmp/artifacts/PASS_shell-smoke.log

Does that sound ok to you? Or does anyone know about a standardized test result format we could use for this purpose?

I like simple things, but maybe in this case I would opt in for xunit. We already use it on downstream, but it needs standardization. I believe in the future we want to easily extend the format with various attributes related to the testcase.

Here are some current examples (sorry internal RH links only):
installability test - https://url.corp.redhat.com/xunit-installability
functional test - https://url.corp.redhat.com/functional-xunit

Also xunit is already supported by our CI dashboard, which will sooner or later get to Fedora also.

https://url.corp.redhat.com/ci-dashboard-systemd-example

Yesterday we've discussed this in more detail on the Best Practices meeting with @mvadkert, @pcahyna and @jheger and these were the most important points we agreed on:

  • Let's introduce a new file which will become part of the specification
  • The format should be simple so that it may be easily generated with any scripts
  • Although xunit is a standard format it is not simple enough (would need additional tools)
  • Yaml seems to be the easiest way, allows minimal one-line entries as well as extensibility
  • There should be two mandatory fields: result and test
  • Optional field log can be used to point to a log file with details

For example following format could be used to denote results of the three example tests from the ticket description:

- {result: pass, test: shell/smoke}
- {result: pass, test: shell/login}
- {result: fail, test: shell/func}

Appending a new line with this format is easily doable even from a shell script. If desired, additional fields can be used.

- {result: pass, test: shell/smoke, log: /tmp/artifacts/PASS_shell-smoke.log}
- {result: pass, test: shell/login, log: /tmp/artifacts/PASS_shell-login.log}
- {result: fail, test: shell/func, log: /tmp/artifacts/FAIL_shell-func.log}

Of course, the same data can be stored in the multiline format as well:

- test: smoke
  result: PASS
  log: /tmp/artifacts/PASS_shell-smoke.log

Any other thoughts? Comments? Suggestions?

I suggest you wrap this list in a dictionary with e.g. the results key as the top level element (as shown in the Taskotron example). That will make the format a bit more future-proof, e.g. you can add a top-level key version or a global log or something similar later.

I believe you'll definitely need something like item or subject key to specify what got tested in the future, but that can be added later.

I would be in favor of adding an extra status ERROR to denote an infrastructure or environment error. This way we could keep the same format for the result file:

FAIL shell/login
ERROR shell/smoke
PASS shell/func

And also supporting an alternate output format like xUnit would be ideal because a lot of tools can analyse those files.

I suggest you wrap this list in a dictionary with e.g. the results key as the top level element (as shown in the Taskotron example). That will make the format a bit more future-proof, e.g. you can add a top-level key version or a global log or something similar later.

+1 sounds like we might need this anyway in the future

I believe you'll definitely need something like item or subject key to specify what got tested in the future, but that can be added later.

Yep, thanks for the suggestions @kparal !

Adding results key seems reasonable to make the format a bit more clear but I believe is against one of our agreed requirements:

  • The format should be simple so that it may be easily generated with any scripts

Not sure we would like to force all users (e.g. with a basic shell script test) to add a yaml parser dependency to their test. Or do we?

There's one more question: There might be multiple ansible playbooks stored in the tests directory. These could be executed in parallel to save time. Then we should make sure they do not write results at the same time or to the same file.

This also relates to artifacts directory not being cleaned up before new run (which confuses users). Shall we have a separate directory/file for each playbook/run? The same problem applies to the test.log as well. Thoughts?

The format should be simple so that it may be easily generated with any scripts

Not sure we would like to force all users (e.g. with a basic shell script test) to add a yaml parser dependency to their test. Or do we?

They'll be producing the file, not consuming it, right? They can create the file with just echo & cat, if they really want. results: is just a single extra line at the top.

They can also produce a json file, if that's easier for them (and there are json-related tools for the command line) because json is a valid yaml.

In Taskotron, we've even provided a taskotron_result command line tool that could create the file for them:
https://qa.fedoraproject.org/docs/libtaskotron/latest/writingtasks.html#saving-task-results-in-resultyaml-format

There's one more question: There might be multiple ansible playbooks stored in the tests directory. These could be executed in parallel to save time. Then we should make sure they do not write results at the same time or to the same file.
This also relates to artifacts directory not being cleaned up before new run (which confuses users). Shall we have a separate directory/file for each playbook/run? The same problem applies to the test.log as well. Thoughts?

In Taskotron we've created a separate artifact directory for each playbook (so for tests[123].yml there would be artifacts/tests[123].yml/ directories) and the TEST_ARTIFACTS envvar would be specific for that particular playbook (for tests1.yml it would point ot artifacts/tests1.yml/). That allowed the system to be completely transparent for the test suite, it didn't need to bother with different results.yml/test.log names, care about race conditions, etc. But the test system must be able to handle multiple test results files.

This also allows us to have artifacts/taskotron/ where we can put test system-specific logs for easy retrieval. Here's an example:
https://taskotron.fedoraproject.org/artifacts/all/3782f02a-1e36-11e9-9451-525400fc9f92/

Thanks much for your detailed feedback, @kparal. I think it makes sense. Extending the file with the results: does not add that much complexity and if we are to handle parallel jobs as well we could apply the same approach which is used in Taskotron. So to sum up the proposed format is this:

results:
- {result: pass, test: shell/smoke}
- {result: fail, test: shell/login}
- {result: error, test: shell/func}

Extended version with links to logs:

results:
- {result: pass, test: shell/smoke, log: /tmp/artifacts/PASS_shell-smoke.log}
- {result: fail, test: shell/login, log: /tmp/artifacts/PASS_shell-login.log}
- {result: error, test: shell/func, log: /tmp/artifacts/FAIL_shell-func.log}

And, for completeness, the expanded multiline version:

results:
- result: pass
  test: shell/smoke
  log: /tmp/artifacts/PASS_shell-smoke.log

Examples above include the status error proposed by @flepied as well. Thoughts? Likes? Disagreements?

Thanks all for the review and feedback. Seems we have a consensus about the result format. Here are the steps needed to make this happen:

  • Update Standard Test Interface documentation (wiki)
  • Add support to Standard Test Roles
    • Generate additional file results.yml
    • Do not return non-zero exit code upon test failure
  • Update pipeline to handle multiple playbooks correctly
    • Run each present playbook separately
    • Create a separate artifacts directory for each
  • Adjust error/failure reporting in the pipeline
    • Report error when ansible returns non-zero (infra error)
    • Report error if error appears in results.yml files
    • Report test failure if fail appears in results.yml files
    • Report pass otherwise

I'm going to file individual issues for these but there's one more thing: During our discussion on the Best Practices meeting it was proposed to add version to the STI as this change is introducing a backward-incompatible behaviour. That means updating STI spec and STR as well. Also all (future) tests should include format version in tests.yml as well and corresponding documentation should be adjusted.

I am a bit hesitant whether this step is absolutely necessary at this time as (and this is just my estimation) vast majority of existing tests is currently using STR so there is no need for any adjustment in their implementation to make the new results reporting working.

I've filed individual issues to implement the agreed changes:

Let me know if I missed anything.

Regarding the spec versioning, I suggest to use 1.1.0 as the initial version:
https://pagure.io/fedora-ci/general/issue/28

Metadata Update from @psss:
- Issue tagged with: Standard Test Interface

5 years ago

Log in to comment on this ticket.

Metadata