#270 FESCO topic proposal - preupgrade and F-12

Created 7 years ago by jlaska
Modified 7 years ago

= Topic Proposal =

= Overview =

Debugging and content below provided by Will Woods.

== Problem Space ==

Preupgrade to F12 is basically not going to work for anyone
without significant manual workarounds, due to insufficient disk space
on /boot.

== Solution Overview ==

I think we may need to talk to hughsie and/or the desktop team
about removing the preupgrade integration in PackageKit for F10/F11 and
how to do preupgrade right for F13 and higher.

== Scope ==

Users of F-11 upgrading to F-12 using preupgrade

== Active Ingredients ==

Reaching consensus on a solution before F-12 and identifying task owners.

=== Sub Component ===

= Discussion Points =

{{{
Here's the details.
The default /boot partition is 200MB, but there's some overhead:
Ext3/Ext4 overhead: 7MB
Reserved space: 10MB
F11 kernel: 8MB (at least - usually 3 kernels = 24MB)
GRUB/EFI files: 1MB
Total overhead: 26MB

So there's 174MB of usable space maximum, and usually 158MB available.

preupgrade now requires at least 167MB free space on /boot:
F12 installer images: 143MB (8mb larger than F11!)
F12 kernel: 18MB (10mb larger than F11!)
RPM/anaconda tmpfiles: >=8MB (measured in stupid tests)
Total: 167MB (Was 149MB in F11 - no problem!)
}}}

So, preupgrade isn't going to work for anyone upgrading to F12 unless:

a. they created a custom partition layout with /boot > 200MB, or
b. they do some ugly manual workarounds (tune2fs, remove kernels), or
c. we add some code to allow them to fetch install.img via http if /boot is low on space, or
d. we add some preupgrade code to allow the user to store install.img on a USB key or similar.

Unfortunately ...

'''c.''' - requires a wired network connection and seems to correlate with the GRUB Explode bug (bug 533545 or its suspected ancestor, bug 450143). We have a possible workaround for that (re-run
grub-install) but that's not a very safe command and it's not necessary in a lot of cases - but we can't seem to predict when it is necessary.

'''d.''' - [[requires time+expertise that QA doesn't have,]] and that wouldn't help anyone trying to run preupgrade remotely (via the cli version) or who doesn't have a USB key handy.

== Owners ==

In need of an owner. If packagekit change, I believe this would be the Desktop SIG?

It seems to me that some of the workarounds could be scripted; at least the tune2fs change could be done safely. Removing old kernels is perhaps more problematic, but if there's a way for us to know which kernel you've booted, then we could remove ones older than the one you're running.

About the reserved space: Since the files added to /boot are written by root, there should not be an issue using it.

Replying to [comment:2 till]:

About the reserved space: Since the files added to /boot are written by root, there should not be an issue using it.

Alas, that's not how it works.

Since anaconda/rpm honors the reserved space (even though it's running as root!) it will refuse to write files into that extra 10MB. preupgrade has no problem with this, and will let you boot into anaconda, but anaconda's RPM transaction test will not see enough free space to install the F12 kernel package and give you an error message and force you to exit the installer.

I don't think turning preupgrade unsupported is an option at all. It's the only upgrade option we "support" (as live yum upgrade is unsupported) which actually works. Upgrading from the DVD is completely broken as it won't use online repositories, not even updates, so you end up with bugs like the infamous "yum doesn't work anymore after upgrade to F11" bug.

IMHO, we should fix preupgrade to:
* remove all non-running kernels AND
* tune2fs the reserved space for /boot to 0, that partition is not writable by non-root anyway, so having the space reserved serves no useful purpose.
That should leave us with some safety margin.

If we need to delay the F12 release to allow for a fixed preupgrade to be pushed to F10 and F11 first, we should to it.

Replying to [ticket:270 jlaska]:

c. we add some code to allow them to fetch install.img via http if /boot is low on space, or

Note, there shouldn't need to be any code added to do the fetch. I just preupgraded from F10 to rawhide with a 100MB /boot so the code is there.

'''c.''' - requires a wired network connection and seems to correlate with the GRUB Explode bug (bug 533545 or its suspected ancestor, bug 450143). We have a possible workaround for that (re-run
grub-install) but that's not a very safe command and it's not necessary in a lot of cases - but we can't seem to predict when it is necessary.

  • Wired connection -- That doesn't seem like a showstopper to me.
  • The grub bug -- I'm unclear of its scope. Is it just occurring when raid'd /boot is involved? Why shouldn't we consider the grub bug a blocker to the release instead of dropping support for preupgrade? How does this differ from this happening before (bug #450143 goes back to F9 and my anecdote is that this happened to me with the F10-F11 but not with a F10-F12(rawhide))?

Replying to [comment:5 toshio]:

Note, there shouldn't need to be any code added to do the fetch. I just preupgraded from F10 to rawhide with a 100MB /boot so the code is there.

That's a different check. There's two things we need to check
1. Do we have enough room to download installer images? (this is the one you hit)
1. [new] Do we have enough extra room on /boot for anaconda to run?

The latter check is not in current preupgrade. We could just modify the first check to see if there's enough room for (installer images + kernel overhead). Either way it requires code changes and thorough testing.

  • Wired connection -- That doesn't seem like a showstopper to me.

No, but it significantly degrades the upgrade experience, which is why we need a long-term fix for this problem as well as an emergency F12 plan.

  • The grub bug -- I'm unclear of its scope. Is it just occurring when raid'd /boot is involved?

No, it happens in other cases as well.

Why shouldn't we consider the grub bug a blocker to the release instead of dropping support for preupgrade?

Because it's very hard to reproduce reliably and the majority of people never hit it. It seems to only affect a very, very small percentage of users in general, and a slightly higher percentage of people running preupgrade. It's even more frequent with people doing preupgrades where stage2 is fetched over the internet. But AFAICT we're still talking about something like 3-5% of the time - and that's 3-5% of a (normally) rare corner case to the non-standard upgrade path. So it wouldn't really qualify as a release blocker in my opinion - especially when we can fix the trigger with an update to F10/F11.

But there's some hope - I finally happen to have a KVM guest that hits it reliably so we might be able to debug the GRUB stuff further.

How does this differ from this happening before (bug #450143 goes back to F9 and my anecdote is that this happened to me with the F10-F11 but not with a F10-F12(rawhide))?

It's the same symptom and the same trigger, certainly, but we've never been able to determine the root cause.

New f11 gnome-packagekit build queued up here which disables the preupgrade dialog: https://admin.fedoraproject.org/updates/gnome-packagekit-2.27.3-2.fc11

On #fedora-devel when I ran into this exact problem somebody mentioned that the default /boot size was going to be upped to 500GB. Has this actually been done?

The preupgrade proposal was approved.

Login to comment on this ticket.