#16 bodhi comment directive module
Closed: Fixed None Opened 10 years ago by tflink.

For phase 1, we need to replace AutoQA. This means that we need the ability to report results directly to bodhi - at least for the short term.

The directive will take in TAP output and submit comments to bodhi using FAS credentials stored in a configuration file or injected by the runner (as in the examples). If the implementation uses injection, make sure that the actual credentials are not logged anywhere public.

report:
bodhi: action=comment baseurl={{bodhiurl}} username={{fasuser}} password={{faspassword}} doreport=all

The main corner case of note is depcheck. The current paradigm is that depcheck will add reporting comments on state change (PASS to FAIL or FAIL to PASS) or repeat comments if the update in question has been in FAIL for a configurable number of days. Example directive for depcheck:

report:
bodhi: action=comment baseurl={{bodhiurl}} username={{fasuser}} password={{faspassword}} doreport=onchange

The code for this should be mostly complete in AutoQA (both in lib and depcheck) and hopefully will require little modification.

This is somewhat related to #47 and will likely require some coordination


This ticket had assigned some Differential requests:
D31
D67

The Bodhi reporting section will be executed from the test client, or from the server? With our new architecture, I'd like all reporting be done from a central place. So that we have just a single place to watch for errors, and a single place to search for logs. The test client only executes the test and sends back the logs, and some reporter daemon on the server side takes over and reports the results. What do you think?

I'm honestly not sure I see an advantage to doing all reporting from a central place.

I agree about having a single place to watch for errors and search for logs, though but I don't see how routing all reporting through a central point helps us. I was thinking something more like [[http://logstash.net/|logstash]] or leveraging more of the advanced features of [[http://pythonhosted.org/Logbook/|logbook]].

I was very unhappy when AutoQA test client did the reporting themselves. You might remember our struggle with sending out emails. Instead of having a single environment to configure and debug (e.g. sendmail/procmail configuration), we had dozens of clients, some of which worked and some of which did not. I'd like to avoid this in the future. There are a lot of corner cases we can hit - for example, let's say we use python-bodhi to post our test results to Bodhi, and coincidentally the test case executed is a functional test for a new version of python-bodhi package. The test case overwrites our system package, and our test reporting fails.

I also see some advantages in the very decoupling of test execution and result reporting. Our tests need no longer to fail if resultsdb/bodhi/etc is down. We no longer need to re-execute the whole test to fix it. If the test clients just return their logs (and TAP files etc) and there's a separate process for reporting, then only the reporting process needs to be re-executed in the event of resultsdb/bodhi/network problem. This daemon can easily have a queue of unreported results, and re-try every hour or so if there are some intermittent problems with some service.

I believe we could make the whole process more reliable and better structured, if we decouple these two things. (Also, it might allow us to lock down the test clients even more. Do we really want to allow them to send out emails? Or maybe only our reporting daemon running on a particular machine under our control should be able to do that?)

I was very unhappy when AutoQA test client did the reporting themselves. You might remember our struggle with sending out
emails. Instead of having a single environment to configure and debug (e.g. sendmail/procmail configuration), we had dozens
of clients, some of which worked and some of which did not. I'd like to avoid this in the future. There are a lot of corner
cases we can hit - for example, let's say we use python-bodhi to post our test results to Bodhi, and coincidentally the
test case executed is a functional test for a new version of python-bodhi package. The test case overwrites our system
package, and our test reporting fails.

I hadn't thought about the case where python-fedora might be updated on the client. Given the rate at which new python-fedora release come out and how careful the devs are about breaking stuff, I don't think this is a huge case to be worrying about for the moment, though. I also want to make reporting-by-bodhi-comment go away - it's a sub-optimal method for reporting status and while we need to do the same as AutoQA for now, I don't see bodhi comments being something that we do long term.

I'm not trying to dismiss your concerns - I don't have any numbers off hand to back this up but I suspect that bodhi errors are one of the most common (if not the most common) causes of test failure in AutoQA. That and the stupid texlive updates overwhelming/breaking the clients. Even in the initial deployment, there are going to be a lot of moving parts. My concern is that by adding yet another component, we're going to have that much more code to write and another potential failure point to triage in the case of failure and another integration point to deal with in the case of testing.

I think that a centralized logging system (ie logstash) will help us a lot in terms of being able to detect and fix problems as they happen without having to dig through the system logs on 20 systems. I also think that designing our reporting systems to acknowledge the fact that we have at least 3 audiences (user, task author/maintainer, system author/maintainer) will mitigate many of the administration and triage pain points that we've hit with AutoQA.

I also see some advantages in the very decoupling of test execution and result reporting. Our tests need no longer to
fail if resultsdb/bodhi/etc is down. We no longer need to re-execute the whole test to fix it. If the test clients just
return their logs (and TAP files etc) and there's a separate process for reporting, then only the reporting process needs
to be re-executed in the event of resultsdb/bodhi/network problem. This daemon can easily have a queue of unreported
results, and re-try every hour or so if there are some intermittent problems with some service.

I do like this idea but I wonder if it would be better implemented as a tool to go through task logs. If the tasks store results in a file with the other output, we could go through and re-report them in case of failure without needing another system for reporting.

If we did implement a reporting service as you suggest, we'd have the same problems when that service went down. By doing the reporting in the client, sanely reporting execution status and storing the results on disk, I think we could have most of the advantages that you mention with less complication.

I believe we could make the whole process more reliable and better structured, if we decouple these two things. (Also, it
might allow us to lock down the test clients even more. Do we really want to allow them to send out emails? Or maybe only
our reporting daemon running on a particular machine under our control should be able to do that?)

Agreed 100% that the clients should never send out emails. They'll need to interface with bodhi anyways, so that access needs to be there, regardless of whether we do comments from the client or not. I realize that I haven't finished writing the notification/reporting design proposal yet, but my thought was to only send emails from the master in case of job execution failure (the execution status and reporting would be more complicated than just emails, though. We've both seen how well that's worked for AutoQA). All user-facing notifications would be done via fedmsg and we'd rely on fedmsg notifier for emails, irc pings etc. and users will configure their notifications from fmn instead of us worrying about notification code.

rLTRNe74c632765d2d6525b047cbcdc89ea4fa62765ef

Login to comment on this ticket.

Metadata