Issue #291: Missing decision change message when multiple results and waivers are created - greenwave

greenwave

#291 Missing decision change message when multiple results and waivers are created

Closed: Fixed 4 years ago by gnaponie. Opened 5 years ago by lholecek.

Making decision allows to pass lists of result and waiver IDs to ignore - ignore_result, ignore_waiver. These are used to compare new decision with older one based on newly created result or waiver so that GW knows when to publish decision change message on UMB.

The problem is that there could be multiple new results and waivers so ignoring just single result or waiver ID is not sufficient.

E.g. if there are new results with IDs 100 and 101, GW gets UMB message about new result 100 first and uses /decision?ignore_result=[100]&... to fetch previous decision incorrectly based on result with ID 101.

It gets more complicated when new result is created with a waiver.

Very simple solution would be to publish decision change message always, disregarding any previous decision.

No matter the solution, I also suggest to remove ignore_result and ignore_waiver from documentation and prefix them with underscore if they're used only internally.

csomh commented 5 years ago

@lholecek: is it known who are the ones relying on the decision change message and what their expectation/interpretation is of this message being sent or not?

Edited 5 years ago by csomh

lholecek commented 5 years ago

is it known who are the ones relying on the decision change message and what their expectation/interpretation is of this message being sent or not?

I thought one is Bodhi, but it doesn't seem to be the case (it just requests decision from GW again when needed).

Someone is definitely using it internally (see results for "greenwave decision update" search query on JIRA).

@mvadkert Would it be OK for Greenwave to send the decision update messages even if the decision doesn't change?

mvadkert commented 5 years ago

@lholecek hmm, I think our gating bot will need to handle this situation somehow. Is it possible to detect somehow from the message that nothing changed? If there is a decision update of a relevant component/decision context, our bot will react on it and send out an email unfortunately

lholecek commented 5 years ago

@mvadkert When do you expect to get decision update message? Even when only satisfied_requirements change? (Because that doesn't work currently.) When result IDs change in unsatisfied_requirements? (This questions came up in PR#290.)

mvadkert commented 5 years ago

@lholecek I think about it when I should inform the user about the change. Currently (we gate really only on one test result) we react on both, where the former means gating is go (passed, no actiion needed), and the latter is no-go (test failed). Now if we will be gating on multiple results, I think I will want to inform the user just after all results came in, so I mitigate spaming of users.

So I try to recap, what I really need is to be able to identify from the decision update messages is, if all tests had been run and what is the outcome. I do not much care about how many times I really get the decision update, I can ignore, if it is not an "end" state.

@lholecek sorry for replying so late, I was on PTO/2, catch me on IRC, if I need to ellaborate more

lucarval commented 5 years ago

@mvadkert, thank you for the input.

It does seem like publishing a decision update, even if it has not changed, would be simplest approach here. Receiving users could still ignore that update if the "end" state hasn't been reached yet. This should be possible to determine from each message sent by Greenwave.

@gnaponie, I think we should consider pulling this story soon to increase reliability of Greenwave decision messages.

gnaponie commented 5 years ago

Ack.
Just one doubt: start publishing the final decision after every message and stop sending a decision update... how that will influence other stakeholders?
I think we should ask that also to @bowlofeggs and @adamwill

Metadata Update from @gnaponie:
- Issue assigned to gnaponie

5 years ago

adamwill commented 5 years ago

It seems fine to me. At least on the back of an envelope you can write a 'stateless' decision consumer easily anyway: "get message; get message subject $SUBJ; get message decision; if decision is GO and we have not previous shipped $SUBJ, think about shipping it; if decision is NO GO and we are in the middle of shipping $SUBJ, think about cancelling it; if decision is NO GO and we already shipped $SUBJ...panic?"

but, as you say, if you want your consumer to trigger only on decision changes, you can just keep a local record of the most recent decision for the $SUBJ or whatever.

To clarify, there are AFAIK no existing consumers for greenwave decision messages, as they've just never been correct enough to use, I don't think.

Edited 5 years ago by adamwill

gnaponie commented 5 years ago

And what about the fact that we are publishing the new messages with this topic: greenwave.decision.update?
If we change this behaviour it doesn't really make sense to keep using "decision.update", should we change that in just something "greenwave.update"? Should we keep on publishing the "old" topic? Should we use both?
@lucarval thoughts on that?

Edited 5 years ago by gnaponie

mprahl commented 5 years ago

@gnaponie I like the idea of publishing on both topics (current and new).

lucarval commented 5 years ago

I'd argue that greenwave.decision.update is still relevant. Something caused Greenwave to re-evaluate the decision. The decision outcome may not have changed but it was certainly updated.

gnaponie commented 5 years ago

@lucarval so would you leave just the "greenwave.decision.update" topic, but change the meaning of it?
(That's fine by me, I'm just checking I got it right :D)

Edited 5 years ago by gnaponie

lucarval commented 5 years ago

@gnaponie, yes.

gnaponie commented 5 years ago

We had some discussion about it inside the team.
One idea to resolve the bug without changing the messaging behaviour is to retrieve the old decision putting a list of "ignored_results" instead of just an ID and this list would have all the results send out with the same timestamp (that greenwave will retrieve).
We would like also to put the timestamp in the decision, so that if there is some lag the user could notice that.

The same for waivers.

Another idea is instead to follow the previous idea and to introduce a new topic so that we can give time to get used to the new behaviour since this won't be backward compatible.

What people think?

cc. @mprahl @mikeb @lucarval @adamwill @mvadkert

adamwill commented 5 years ago

Nothing on the Fedora side has ever actually used the fedmsgs yet (as they have never really been correct for any Fedora case), so as far as Fedora goes it really doesn't matter much. I don't know about RH internal uses.

lucarval commented 5 years ago

@mprahl has mentioned using timestamps to address this issue. To elaborate, when Greenwave receives message about a new result being available, this message contains the timestamp in which the result was created in ResultsDB. Greenwave could use this timestamp to query ResultsDB. The /results/latest API endpoint in particular accepts the since parameter:

Extract timestamp from result.
Fetch results from before timestamp to determine "old decision": since=1900-01-01,timestamp
Fetch results without since to determine new decision. We can't use since=timestamp because that may exclude results for other test cases. However, a range might be useful here.

Something similar would have to be done with waivers.

I think this would solve the timing issue. We may need to tweak the ranges. Perhaps to gather old decision, the end date should be timestamp - 1 second. Likewise, to gather new decision, use a range but with end data as timestamp + 1 second.

mprahl commented 5 years ago

@mprahl has mentioned using timestamps to address this issue. To elaborate, when Greenwave receives message about a new result being available, this message contains the timestamp in which the result was created in ResultsDB. Greenwave could use this timestamp to query ResultsDB. The /results/latest API endpoint in particular accepts the since parameter:

Extract timestamp from result.
Fetch results from before timestamp to determine "old decision": since=1900-01-01,timestamp
Fetch results without since to determine new decision. We can't use since=timestamp because that may exclude results for other test cases. However, a range might be useful here.

Something similar would have to be done with waivers.
I think this would solve the timing issue. We may need to tweak the ranges. Perhaps to gather old decision, the end date should be timestamp - 1 second. Likewise, to gather new decision, use a range but with end data as timestamp + 1 second.

Greenwave should also output the timestamp of the results it used to determine the decision. That way if there is a race condition causing Greenwave to output an older decision after a newer decision (e.g. if there are multiple results received one after the other), the client can use that to determine what the actual latest decision is.

mvadkert commented 5 years ago

Another idea is instead to follow the previous idea and to introduce a new topic so that we can give time to get used to the new behaviour since this won't be backward compatible.

I would prefer a new topic, so we can adapt to it without the need to change an in-production process at once :)

lucarval commented 5 years ago

@mvadkert, the idea suggested in my previous comment would fix the existing behavior. It should be fully backwards compatible. We'll do further exploration to see if that does solve the issue though.

mvadkert commented 5 years ago

@lucarval agreed, thanks! That is prefered even more :)

gnaponie commented 5 years ago

PR: https://pagure.io/greenwave/pull-request/403

Metadata Update from @gnaponie:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago