Currently the Standard Test Interface does not define how to distinguish test failure from an infrastructure error. When ansible playbook is run it usually does some environment preparation and only then executes tests. Upon a problem encountered during provisioning or configuration of the test environment ansible returns non-zero exit code but there is not an easy way how to detect such error from a successful test execution of test which failed (returns non-zero exit code as well).
Let's make this part of the specification more clear so that tools can detect infrastructure errors and report them as such to the end user. This will help to direct problems (errors versus failures) to the right audience (infrastructure staff versus developer/tester).
One of the possible solutions could be to use the test.log file to define test PASS/FAIL and use ansible return code to denote infrastructure problems:
test.log
ansible playbook failed ---> ERROR (infrastucture problem) ansible playbook passed and: there is no test.log file ---> ERROR (no results, something went wrong) test.log contains a failed test ---> FAIL (test or tests failed) test.log contains a passed test ---> PASS (test or tests passed)
As far as test.log file is concerned, current specification defines it in a very general way:
MUST treat the file test.log in the artifacts folder as the main readable output of the test.
We could define the format of the test.log file in more detail to make the result reporting easier. For example recommend the simple format RESULT test identifier currently implemented by Standard Test Roles, for example:
RESULT test identifier
PASS shell/login PASS shell/smoke PASS shell/func
Another option could be introducing a new file for the result summary, for example test.results and keep test.log for free form test output. What are your thoughts on this?
test.results
@stefw, @pingou, could you please review?
Metadata Update from @psss: - Issue assigned to psss
Very good move. Do you think TAP is a good candidate format for the test.log? rather than inventing yet another one?
https://testanything.org/
Agree with this ^^
The original idea of the test.log file was to record any and all output from the tests so that the packager could quickly see what went wrong. Changing this would basically break the API we defined, so I'd push for the test.results file as proposed.
Re-using existing tools is definitely a bonus, though the work on TAP seems to be low at the moment, there is a branch in their specification repository for the version 14 of the specification, but there was no change to it in the last 4 years: https://github.com/TestAnything/Specification/tree/tap-14-specification :sob: (Amusingly, there even have a ticket about return code: https://github.com/TestAnything/Specification/issues/22 )
We've used TAP in Taskotron for some time, and we abandoned it. IIRC it was way too simple. TAP13 allows you to include an arbitrary YAML inside, but we've had issues with TAP13 libraries at that time, and we figured why specify a custom YAML section (and possible write our own TAP13 library) when we can have the whole file as a YAML, therefore making it simpler. It's easy to produce by any library, easy to validate.
Our current Taskotron results file looks like this:
results: - item: xchat-0.5-1.fc20 type: koji_build checkname: mytest outcome: PASSED
See the full documentation at: https://qa.fedoraproject.org/docs/libtaskotron/latest/resultyaml.html
The important thing is that it allows us to define multiple results, possibly even with different item (in your terminology subject) types. This is not a problem for current Fedora CI, because you only execute package-specific tests (a single package -> a single test suite). If you intend to execute generic tasks in the future, you'll need to submit multiple results. Here's an example from rpmdeplint, generating package dependency results for each package in the testing repository: https://taskotron.fedoraproject.org/artifacts/all/efeda40a-02db-11e9-b69f-525400fc9f92/tests.yml/taskotron/results.yml.gz (currently all results are type: koji_build, but in the past we also had type: bodhi_update results mixed in, as an overall result for the whole Bodhi update - I'm explaining just to illustrate the concept)
type: koji_build
type: bodhi_update
In Taskotron, we consider any task execution that hasn't generated a results.yml file to be a crashed job. It could be possibly automatically re-executed (not implemented on our side). We can therefore easily detect finished jobs (even though the result is FAILED) from crashed jobs.
results.yml
We can't yet detect jobs crashed due a test code error versus some system/infra error, even though we have some proposals in our archives that I could try to dig up if there's interest. But the very first step could be to detect errors that happen even before you run ansible-playbook tests.yml (before this point, it's definitely an test system failure), and for that you don't need any cooperation from test writers.
ansible-playbook tests.yml
Also, here's an example of python-versions specifying an overall result (checkname: dist.python-versions) and subresults (checkname: dist.python-versions.*): https://taskotron.fedoraproject.org/artifacts/all/6ddde2be-0258-11e9-b69f-525400fc9f92/tests.yml/taskotron/results.yml.gz (note: unsorted)
python-versions
checkname: dist.python-versions
checkname: dist.python-versions.*
This way the test case pinpoint where exactly the problem occurred. It can also decide whether the overall result is FAILED, or something else (it can be even PASSED, if the subresult is deemed non-important).
Thanks to all for your feedback and ideas. To summarize: We want to keep the test.log file as it is (used for arbitrary text output of the test) and introduce a new file which would contain a machine-readable result of the testing which could be used to clearly identify infrastructure errors.
From the feedback above TAP format does not seem to be a good candidate. Using a simple yaml format proposed by @kparal seems fine to me. Although we probably do not have to include all fields unless we have a clear use case for them. So what about calling the file results.yaml and including results as a list of dictionaries with the following meaning:
results.yaml
Example output of the bash tests could look like this:
- test: func result: PASS log: /tmp/artifacts/PASS_shell-func.log - test: login result: PASS log: /tmp/artifacts/PASS_shell-login.log - test: smoke result: PASS log: /tmp/artifacts/PASS_shell-smoke.log
Does that sound ok to you? Or does anyone know about a standardized test result format we could use for this purpose?
I like simple things, but maybe in this case I would opt in for xunit. We already use it on downstream, but it needs standardization. I believe in the future we want to easily extend the format with various attributes related to the testcase.
Here are some current examples (sorry internal RH links only): installability test - https://url.corp.redhat.com/xunit-installability functional test - https://url.corp.redhat.com/functional-xunit
Also xunit is already supported by our CI dashboard, which will sooner or later get to Fedora also.
https://url.corp.redhat.com/ci-dashboard-systemd-example
Yesterday we've discussed this in more detail on the Best Practices meeting with @mvadkert, @pcahyna and @jheger and these were the most important points we agreed on:
result
test
log
For example following format could be used to denote results of the three example tests from the ticket description:
- {result: pass, test: shell/smoke} - {result: pass, test: shell/login} - {result: fail, test: shell/func}
Appending a new line with this format is easily doable even from a shell script. If desired, additional fields can be used.
- {result: pass, test: shell/smoke, log: /tmp/artifacts/PASS_shell-smoke.log} - {result: pass, test: shell/login, log: /tmp/artifacts/PASS_shell-login.log} - {result: fail, test: shell/func, log: /tmp/artifacts/FAIL_shell-func.log}
Of course, the same data can be stored in the multiline format as well:
- test: smoke result: PASS log: /tmp/artifacts/PASS_shell-smoke.log
Any other thoughts? Comments? Suggestions?
I suggest you wrap this list in a dictionary with e.g. the results key as the top level element (as shown in the Taskotron example). That will make the format a bit more future-proof, e.g. you can add a top-level key version or a global log or something similar later.
results
version
I believe you'll definitely need something like item or subject key to specify what got tested in the future, but that can be added later.
item
subject
I would be in favor of adding an extra status ERROR to denote an infrastructure or environment error. This way we could keep the same format for the result file:
FAIL shell/login ERROR shell/smoke PASS shell/func
And also supporting an alternate output format like xUnit would be ideal because a lot of tools can analyse those files.
+1 sounds like we might need this anyway in the future
Yep, thanks for the suggestions @kparal !
Adding results key seems reasonable to make the format a bit more clear but I believe is against one of our agreed requirements:
Not sure we would like to force all users (e.g. with a basic shell script test) to add a yaml parser dependency to their test. Or do we?
There's one more question: There might be multiple ansible playbooks stored in the tests directory. These could be executed in parallel to save time. Then we should make sure they do not write results at the same time or to the same file.
tests
This also relates to artifacts directory not being cleaned up before new run (which confuses users). Shall we have a separate directory/file for each playbook/run? The same problem applies to the test.log as well. Thoughts?
The format should be simple so that it may be easily generated with any scripts Not sure we would like to force all users (e.g. with a basic shell script test) to add a yaml parser dependency to their test. Or do we?
The format should be simple so that it may be easily generated with any scripts
They'll be producing the file, not consuming it, right? They can create the file with just echo & cat, if they really want. results: is just a single extra line at the top.
echo
cat
results:
They can also produce a json file, if that's easier for them (and there are json-related tools for the command line) because json is a valid yaml.
In Taskotron, we've even provided a taskotron_result command line tool that could create the file for them: https://qa.fedoraproject.org/docs/libtaskotron/latest/writingtasks.html#saving-task-results-in-resultyaml-format
taskotron_result
There's one more question: There might be multiple ansible playbooks stored in the tests directory. These could be executed in parallel to save time. Then we should make sure they do not write results at the same time or to the same file. This also relates to artifacts directory not being cleaned up before new run (which confuses users). Shall we have a separate directory/file for each playbook/run? The same problem applies to the test.log as well. Thoughts?
In Taskotron we've created a separate artifact directory for each playbook (so for tests[123].yml there would be artifacts/tests[123].yml/ directories) and the TEST_ARTIFACTS envvar would be specific for that particular playbook (for tests1.yml it would point ot artifacts/tests1.yml/). That allowed the system to be completely transparent for the test suite, it didn't need to bother with different results.yml/test.log names, care about race conditions, etc. But the test system must be able to handle multiple test results files.
tests[123].yml
artifacts/tests[123].yml/
TEST_ARTIFACTS
tests1.yml
artifacts/tests1.yml/
This also allows us to have artifacts/taskotron/ where we can put test system-specific logs for easy retrieval. Here's an example: https://taskotron.fedoraproject.org/artifacts/all/3782f02a-1e36-11e9-9451-525400fc9f92/
artifacts/taskotron/
Thanks much for your detailed feedback, @kparal. I think it makes sense. Extending the file with the results: does not add that much complexity and if we are to handle parallel jobs as well we could apply the same approach which is used in Taskotron. So to sum up the proposed format is this:
results: - {result: pass, test: shell/smoke} - {result: fail, test: shell/login} - {result: error, test: shell/func}
Extended version with links to logs:
results: - {result: pass, test: shell/smoke, log: /tmp/artifacts/PASS_shell-smoke.log} - {result: fail, test: shell/login, log: /tmp/artifacts/PASS_shell-login.log} - {result: error, test: shell/func, log: /tmp/artifacts/FAIL_shell-func.log}
And, for completeness, the expanded multiline version:
results: - result: pass test: shell/smoke log: /tmp/artifacts/PASS_shell-smoke.log
Examples above include the status error proposed by @flepied as well. Thoughts? Likes? Disagreements?
error
looks good to me
looks good to me +1
Thanks all for the review and feedback. Seems we have a consensus about the result format. Here are the steps needed to make this happen:
fail
I'm going to file individual issues for these but there's one more thing: During our discussion on the Best Practices meeting it was proposed to add version to the STI as this change is introducing a backward-incompatible behaviour. That means updating STI spec and STR as well. Also all (future) tests should include format version in tests.yml as well and corresponding documentation should be adjusted.
tests.yml
I am a bit hesitant whether this step is absolutely necessary at this time as (and this is just my estimation) vast majority of existing tests is currently using STR so there is no need for any adjustment in their implementation to make the new results reporting working.
I've filed individual issues to implement the agreed changes:
Let me know if I missed anything.
Standard Test Interface update ready for review: https://pagure.io/fedora-ci/docs/pull-request/5
Regarding the spec versioning, I suggest to use 1.1.0 as the initial version: https://pagure.io/fedora-ci/general/issue/28
1.1.0
Metadata Update from @psss: - Issue tagged with: Standard Test Interface
Log in to comment on this ticket.