Issue #1410: Updates Policy should try harder to prevent updates that break future updates - fesco

fesco

#1410 Updates Policy should try harder to prevent updates that break future updates

Closed None Opened 9 years ago by catanzaro.

= phenomenon =

Since Fedora 21 was released two months ago, we have released two updates that prevented the user from installing further updates with GNOME Software. To some extent this was bad luck, and I don't expect similar breakage every month in the future, but we should still assess whether the update policy should be changed to reduce the likelihood of future incidents.

= background analysis =

FEDORA-2014-17388: The update https://admin.fedoraproject.org/updates/FEDORA-2014-17388/librepo-1.7.11-1.fc21,libhif-0.1.7-1.fc21 caused updates to fail during the reboot, as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1181501. This bug was caused by a change in libhif. The update was released to updates-testing on 2014-12-21 and submitted for stable on 2015-01-06 after receiving 2 karma and spending 14 days in updates-testing. The bug was filed on 2015-01-13, seven days after the update was pushed to stable. This was a reasonably cautious approach to testing, but evidently it was not enough.

FEDORA-2015-1057: The update https://admin.fedoraproject.org/updates/FEDORA-2015-1057/librepo-1.7.13-1.fc21 has caused package downloads to fail, as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1188600. This bug was caused by a change in librepo (that appears to have uncovered a bug in libhif, is that correct?). This update was released to updates-testing on 2015-01-26 and reached stable on 2015-01-28 with autokarma after meeting the stable karma threshold of 3 selected by the packager. The bug was filed on 2015-02-03, six days after the update was pushed to stable. This is a classic example of not ''nearly'' enough time spent in updates-testing, like the selinux-policy update that broke updates last year. librepo is in our critical path, but, like that selinux-policy update, this update was actually done in accordance with our critical path policy, which requires only 2 karma for updates to critical path packages.

In both of these cases, the only way for the user to fix the problem would be to run 'yum update' or 'dnf update' from a terminal.

= implementation recommendation =

We should change our policy to reduce the likelihood of this happening again. I recommend the Updates Policy be amended to something along the lines of the following:

Under the section "Updates to 'critical path' packages," change the bullet "Package updating frameworks (gnome-packagekit, apper)" to read "Graphical package updaters (gnome-software, apper)"
Add a new section "Updates to 'updates path' packages" with this description: "Updates that constitute a part of the 'updates path' package set (defined below) are highly discouraged due to the potential for a regression to break future updates. These packages may never be upgraded to a new version in a stable Fedora release, so all updates must be in the form of individual patches. Patches are permitted only to fix bugs that the package maintainer considers highly critical (for example, a bug that frequently breaks updating for a substantial portion of users, but not a bug that only occasionally breaks updating or only breaks updating for a small portion of users). At the time of the request to stable, the update needs to have a Bodhi karma sum of 3 AND it must spend at least 21 days in updates-testing. An exception is made for critical security updates, which may be pushed according to the normal critpath package guidelines, and for updates that merely revert the effects of a previous update, which may be released immediately. The updates path package set consists of the following packages: PackageKit, libhif, hawkey, rpm, librepo, and libsolv."

The goal of a policy along these lines should be to dramatically reduce the risk of regressions, but also to leave enough flexibility to allow maintainers to address serious issues (e.g. database corruption) when necessary. My suggested 21 day rule is based on the reality that it takes a week for GNOME Software to propose an update. Say a bad update goes out, then updates-testing user will receive that update within the next week, then it takes another week before the next update that reveals the previous update was bad. I first thought that 14 days would be reasonable, but that clearly wasn't long enough for the problem with FEDORA-2014-17388 to be found, so let's go with 21 perhaps?

I also suggest a "hardcoded" list of packages (anything I missed?) rather than trying to be smart by computing it from dependencies or somesuch. Since the suggested policy is fairly strict, it should only apply to the highest-risk packages.

rhughes commented 9 years ago

This 100% has my vote. I'm sick and tired of offline updates breaking.

mattdm commented 9 years ago

I think as of today I am no longer a voting FESCo member, but FTR I am in favor. :)

mitr commented 9 years ago

Replying to [ticket:1410 catanzaro]:

Since Fedora 21 was released two months ago, we have released two updates that prevented the user from installing further updates with GNOME Software. To some extent this was bad luck, and I don't expect similar breakage every month in the future, but we should still assess whether the update policy should be changed to reduce the likelihood of future incidents.

First things first, is there a consensus for why the problems weren’t caught during development, and then testing? MOAR TESTING is would not help if we never tested the kinds things that broke.

Add a new section "Updates to 'updates path' packages" with this description: "Updates that constitute a part of the 'updates path' package set (defined below) are highly discouraged due to the potential for a regression to break future updates. These packages may never be upgraded to a new version in a stable Fedora release, so all updates must be in the form of individual patches.

Veering into micromanagement territory but perhaps reasonable.

Patches are permitted only to fix bugs that the package maintainer considers highly critical (for example, a bug that frequently breaks updating for a substantial portion of users, but not a bug that only occasionally breaks updating or only breaks updating for a small portion of users).

This seems like very much of an overreaction. Keeping a known “small portion” (i.e. thousands or more) users unable to update is a very high price to pay.

At the time of the request to stable, the update needs to have a Bodhi karma sum of 3 AND it must spend at least 21 days in updates-testing.

My suggested 21 day rule is based on the reality that it takes a week for GNOME Software to propose an update. Say a bad update goes out, then updates-testing user will receive that update within the next week, then it takes another week before the next update that reveals the previous update was bad.

This could surely be done another way, e.g. by an automated testing process that does two off-line updates in a row. I could be convinced that just waiting 21 days is by far the easiest way to get this done.

I first thought that 14 days would be reasonable, but that clearly wasn't long enough for the problem with FEDORA-2014-17388 to be found, so let's go with 21 perhaps?

See above, do we in fact know that waiting ''any'' longer time would actually reveal the problem with the way updates are currently being tested?

(Note: There was an earlier conversation about broken updates, and about automated testing that after a package update it is possible to update to a newer version of the same package. We should review that earlier conversation and see what the conclusions were at the time.)

rhughes commented 9 years ago

Replying to [comment:4 mitr]:

First things first, is there a consensus for why the problems weren’t caught during development

In the first case, it appears that people who "tested" the libhif and librepo packages didn't actually do any offline updates. In the second (the ABI breaking librepo) it appears the librepo maintainer didn't test if PackageKit actually ran with the new release, and in the third case it appears nobody actually tested the update before giving karma. People seem to like clicking and getting a little smiley face; perhaps we should require a description of what exactly they tested...

I also think that the PackageKit, librepo, hawkey, libhif and libsolv libraries should be delivered (and QAd) as one unit; it's so confusing to have such a huge matrix of possible versions in a stable release and bohdi just can't cope when updates depend on other updates. If the projects are not one monolithic blob (and I think it's crazy how many projects we have to involve just to update) then we have to co-ordinate much better than we are now.

catanzaro commented 9 years ago

Richard, I only presented two cases, but you responded to three; do you know about another?

Replying to [comment:4 mitr]:

Veering into micromanagement territory but perhaps reasonable.

This seems like very much of an overreaction. Keeping a known “small portion” (i.e. thousands or more) users unable to update is a very high price to pay.

I think the overall proposal can stand if details like these are changed. I wanted to err on the side of avoiding regressions.

See above, do we in fact know that waiting ''any'' longer time would actually reveal the problem with the way updates are currently being tested?

In lieu of somebody to write the tests (unfortunately I am not volunteering), then I think that the longer we wait, the more likely people running updates-testing will perform an offline update and notice breakage. The only reason increasing time spent in updates-testing would not have found these bugs is if no bug reporter running updates-testing performs offline updates. I hope that's not the case, but I have no data.

rhughes commented 9 years ago

Replying to [comment:6 catanzaro]:

Richard, I only presented two cases, but you responded to three; do you know about another?

https://admin.fedoraproject.org/updates/FEDORA-2015-0896/librepo-1.7.12-1.fc21?_csrf_token=9b41226939037bcf76e900f0061a60903ddc946c

Three people give positive karma although there's no way in the world gnome-software/PackageKit/libhif could possibly have worked for them.

jwboyer commented 9 years ago

I'm not really opposed to this suggestion, but I don't think it's an actual solution. A couple of comments.

Why doesn't 'updates path packages' include rpm, dnf/yum, and deltarpm ? They aren't GUI tools, but they're the underlying tools and people will continue to use them on the command line to update for quite a long time.
In the absence of test cases and reliable data on what testers are testing, I think extending the time requirement is at best just delaying the next failure. We should instead be focusing on getting automated test cases in place that can be run by task-o-tron. (We can do both of course.)
Until the testcases are available, perhaps it would be helpful to writeup some scenarios testers can test for coverage purposes. These would include online install/update, offline update, cmdline install/update, etc. A link to a wiki page with these scenarios and how to test them could be put directly in the update description text.

adamwill commented 9 years ago

Replying to [comment:7 rhughes]:

Replying to [comment:6 catanzaro]:

Richard, I only presented two cases, but you responded to three; do you know about another?

https://admin.fedoraproject.org/updates/FEDORA-2015-0896/librepo-1.7.12-1.fc21?_csrf_token=9b41226939037bcf76e900f0061a60903ddc946c

Three people give positive karma although there's no way in the world gnome-software/PackageKit/libhif could possibly have worked for them.

But dnf did. Note the text of the first comment:

"installed on x86_64. dnf still works and still installs software."

the other two +1s have no comment, but it's reasonable to infer - from the fact that the update was issued to fix three DNF bugs - that they are people who had those DNF issues, and tested that the update fixed them.

This is a not-uncommon scenario with certain updates - the generic case is 'the thing that's being updated is used in lots of ways'. It's not unusual for an update to such a Thing to be issued to fix a bug in one of its use cases, and for it to get positive karma based solely on the fact that it fixes that use case. This is one of many things we really need Bodhi 2.0 to fully fix, as it allows us to distinguish different types of feedback. (We do also need proper test case coverage for more packages, though - see next comment).

adamwill commented 9 years ago

Replying to [comment:8 jwboyer]:

Until the testcases are available, perhaps it would be helpful to writeup some scenarios testers can test for coverage purposes. These would include online install/update, offline update, cmdline install/update, etc. A link to a wiki page with these scenarios and how to test them could be put directly in the update description text.

We have a better mechanism than this already, the 'package test plan' system that I always have to keep reminding people about for some reason :P https://fedoraproject.org/wiki/QA:SOP_package_test_plan_creation

Basically, any test in the category [[Category:Package_librepo_test_cases]] (for instance) will be linked from the update text of any update that includes librepo. For a quick small improvement, we can stick the existing update test case in the relevant categories, though at present the update test case is a sort of hybrid that covers both CLI and GUI updates and GUI notifications, it could probably stand to be split out a bit - https://fedoraproject.org/wiki/QA:Testcase_desktop_updates

catanzaro commented 9 years ago

So not just twice, but ''three times'' in the past two months we have released an update that broke future updates? Holy heck.

tmraz commented 9 years ago

I am not FESCo member anymore as well but I just want to add my 2 cents to the voices saying that such overreactive restriction is not solving the real problem that we have - and that is that the proper testing of updates often does not happen. Such strictly restrictive policy as you propose might probably minimize the problem with regressions in these packages but the price will be too high.

mitr commented 9 years ago

This ticket will be discussed in the FESCo meeting on Wednesday at 18:00UTC in #fedora-meeting on irc.freenode.net.

kevin commented 9 years ago

I'm not going to make the meeting tomorrow, but am -1 to the proposal here. I don't think simply adding more time will solve the issue at hand.

kparal commented 9 years ago

Here are a few thoughts from my QA point of view:

The fact that we have completely broken graphical update tools twice (and almost 3 times) in a single month suggests we should do something about it. The question is what.
Those packages should definitely have linked testcases in Bodhi, it slightly increases the chances of someone testing the important workflows. OTOH, I don't think it would have helped too much.
The karma total number is a tricky thing, there are a lot testers out there who give +1 to everything, just because they don't see any regressions in their daily usage, without any actual testing. Which is not a bad thing, really, it helps to verify that our software is not completely broken. But we have to be cautious about the conclusions we draw from it.
A lot of testers don't provide any explanation of what they actually tested, they just say generic "works". QA can try to educate them better (and I assume bodhi2 will try to guide them better), but don't expect miracles. Again, we need to be careful with conclusions taken from this form of feedback.
In my experience, many of the critical path packages like kernel, mesa, or yum get often pushed to stable repositories in just a few days (2-3 days) after creating the update in bodhi. Sometimes, they have just a few karma votes, sometimes they have 10 or more. The problem is, Fedora has many more users than 10. Even with quite high karma, there are still considerable chances that with some hardware combination or with some workflow approach we're going to break the system for a large fraction of our user base. I have experienced that myself several times (both kernel and mesa broke my computer even though it had very high karma). I believe karma in itself is helpful, but should not be relied on as much as it is now.
In this particular case, offline upgrades and packagekit is probably something that most users with updates-testing enabled don't use. The technically savvy users tend to prefer tools with more control, like yum or dnf.
I believe that we have quite a few users running with updates-testing enabled, but only a fraction of those report feedback back to bodhi regularly. The rest of them will comment on the update only if it breaks something for them.
Many of the folks running with updates-testing don't update every day, but e.g. once per week (my case as well).
Sometimes there is a high-profile update sitting in testing for 2-3 weeks and still not many people have tested it, e.g. during Christmas. That's what happened in the first case stated by OP. That's an unfortunate and sad coincidence, but hopefully it should not happen too often.

If you combine all of that, this is what makes sense to me to adjust:
For the most critical packages (i.e. the critical path set), don't rely on karma value as the only means to distinguish whether to push stable or not, especially not automatically. That means karma autopush should really be discouraged for these kind of packages, maybe even unavailable. I believe we always want a human being reading through the feedback, before we push a critical package into stable.
We should make sure the critical path packages stay in updates-testing at least a certain minimal amount of time. If they are pushed in 2-3 days, we miss most of the audience. If we require at least a week in updates-testing, regardless or karma value, we expand the audience considerably. Of course this would not apply for security updates.
* If we combine these two approaches above, maybe it's still reasonable to allow karma autopush in certain cases. E.g. require human push in the first two weeks, allow autopush after two weeks in testing.

If we put these or similar policies into effect, I believe it would help a lot to prevent similar issues as mentioned, while also still being reasonable and not necessarily restricting and annoying our maintainers too much. Also, I think this would work well enough not just for cripath packages in general, but even for software updater stack (packagekit, yum, dnf), so there would be no need to have specific requirements for it, as proposed by OP.

mitr commented 9 years ago

From today’s FESCo meeting:

Increasing the time spent in testing was rejected (-5)
Replacing gnome-packagekit by gnome-software in critical path definition has been approved (+5)
1) Ask librepo maintainer to test off-line updates before pushing any updates. 2) Recommend gnome-software maintainers to write automated tests for their update mechanism (+6)

tmlcoch commented 9 years ago

1) Ask librepo maintainer to test off-line updates before pushing any updates. - ACK
(I'm a Librepo maintainer)

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Milestone

None

fesco

Source Code

#1410 Updates Policy should try harder to prevent updates that break future updates Closed None Opened 9 years ago by catanzaro.

Metadata

#1410 Updates Policy should try harder to prevent updates that break future updates

Closed None Opened 9 years ago by catanzaro.