#135 1883609 Secure Boot fails to boot F33 Beta image
Closed 4 years ago by blockerbot. Opened 4 years ago by blockerbot.

Bug details: https://bugzilla.redhat.com/show_bug.cgi?id=1883609
Information from BlockerBugs App:
1883609

Current vote summary

Commented but haven't voted yet: lruzicka, kleinkravis, kparal, coremodule, chrismurphy, sdharane

The votes have been last counted at 2020-10-23 03:11 UTC and the last processed comment was #comment-697585

To learn how to vote, see:
https://pagure.io/fedora-qa/blocker-review


Although I could not reproduce it on my machine, I think there is enough evidence that this is happening.

FinalBlocker +1

Secure boot is mostly unused. Not important enough to block

FinalBlocker -1

@kleinkravis That's not true, but it doesn't really matter. Our criteria cover this explicitly:

All release-blocking images must boot in their supported configurations.
Supported firmware types
Release-blocking images must boot from all system firmware types that are commonly found on the primary architectures. For the x86_64 architecture, UEFI with Secure Boot configured in accordance with Microsoft's Windows certification requirements is considered a 'commonly found' firmware type.
https://fedoraproject.org/wiki/Basic_Release_Criteria#Release-blocking_images_must_boot

So far we have identified 2 Lenovo models which produce Access Denied error and 1 Dell model which produces "Operating System Loader signature found in SecureBoot exclusion database ('dbx')" error (might or might not have the same root cause). We should try to gather more feedback to get a better idea how many systems are affected.

For the moment, I believe we should follow the criteria and use
FinalBlocker +1

Discussed during the 2020-10-05 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as, while this is a worrying bug, it's also very mysterious at present. We don't know what's going on or how many systems it might affect or if it's even a Fedora bug at all (it may be a Lenovo firmware bug). We'll try to gather more information before making a decision

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-10-05/f33-blocker-review.2020-10-05-16.00.txt

As I understand it, we're too dependent on external factors on the timing here -- when will Microsoft et al put out revocation lists, and can we get a new signed shim in a reasonable time even if we delay? Turning off secureboot is an unfortunate but relatively easy workaround. I'm inclined to encourage us to waive this blocker and commonbugs it.

I would appreciate if @pjones shares some more information about the situation. Communication is a key here.

FESCo agreed (with no small amount of contention) to drop this as a FESCo blocker. Given that the scope and reproducibility now seems smaller than when this was originally proposed, I suggest we reset votes and start again.

REVOTE FinalBlocker

@bcotton Can you please elaborate on that reduced reproducibility? My issue with T450s was most probably unrelated, but do we have some new information related to the Ubuntu updates?

FinalBlocker -1

It does not appear that all (or even most) Ubuntu installs will apply the new revocation list. Microsoft will ship it no earlier than Q2 2021. It seems like it will hit a very narrow set of users, most of whom are likely technically savvy enough to disable secureboot until an updated shim is available.

@kparal AIUI from the bug and the FESCo meeting log, it seems you can't reproduce reliably by simply installing Ubuntu and updating it. It seems unclear whether you still sometimes get the DBX update when updating Ubuntu, on some sort of early-testing randomizer, or whether you never get it any more, but it seems clear that you don't always get it.

You can apparently reliably get the DBX update by forcibly applying it from LVFS, but that's not reproducing whatever real-world scenario the bug reporters ran into.

For the purpose of voting on this as a criteria blocker, I would say it counts as a conditional violation of the Basic criterion "All release-blocking images must boot in their supported configurations", the conditions being:

  • Secure Boot is enabled on the system
  • The DBX update that revokes the key our shim is currently signed with has been applied...somehow

and as well as considering how commonly we believe that will occur right now, we may want to consider how commonly we believe it will occur to people attempting to install Fedora 33 at any point up until Fedora 34 is expected to be released, along with the possibility of post-release respins of some or all F33 images to counteract this.

FinalBlocker -1

It's the best of two not-great options, but sometimes that's where we are.

FESCo meeting log: https://meetbot.fedoraproject.org/fedora-meeting-2/2020-10-21/fesco.2020-10-21-14.00.log.html#l-32

FinalBlocker -1

I think the folks affected will be hopefully small now, and if that set grows to badly before f34 is available we can look at a f33.1 release or the like. I have been pondering how we might do that, and I think it should be possible (of course it's more work for everyone).

FinalBlocker -1

I think the list of affected systems will be smaller than originally thought and disabling secure boot is an option. Many (most?) users already do this, especially if they have nVidia hardware.

Here are my thoughts on the factors from https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process :

How prominently visible the bug will be

Well, if you hit it, "system doesn't boot" is pretty visible. But, that's if you hit it. See below.

How severe the consequences of the bug are

On x86, it's "turn off secure boot in the bios". Secure boot is important in the broad sense, but in the specific sense the individual risk isn't high. Many people are already running this way, e.g. to enable Nvidia's proprietary driver.

On ARM, some devices don't allow secure boot to be disabled. Not sure how likely it is for these to also be ones which have the DBX updated. In any case, IoT Edition is already producing respun images periodically and those will get the new version.

How many users are likely to encounter the bug

Some users will definitely encounter it, enough that it'll be noticed. Otherwise it wouldn't have been discovered during the beta. However, most users will not. I'm judging this to be the vast majority, in fact.

Whether the bug could or should have been proposed earlier in the cycle

Eh, water under the bridge.

Whether the current stable release is affected by the bug

This is a big one for me. It absolutely is. It is conceivable that current systems will stop booting if they get the DBX update somehow. And F32 install media also won't boot on the affected systems. It would be nice F33 were the update that addressed this, but it can be addressed later too -- either with respun install media or by F34 in another six months.

Whether delaying the release may give us an opportunity to carry out other desirable work

Sidebar! Can we get this removed from the factors? There's always more work, and there's always the next release!

Possible effects of the expected delay on Fedora itself and also to downstream projects

Part one: it's bad for Fedora. We've done an excellent job of hitting our target dates for the last few releases and that's good for users, good for people's trust in the project, and good for our image. I'm not saying we need to become strictly time-based, but when it's a judgement call, I'd prefer to error on the side of getting the new release to users as planned. Especially because there's a lot of cool stuff in this release that people are very excited about. I saw someone on twitter switch to nano on purpose.

Part two, RH hat on -- slight delay here wouldn't affect RHEL in this case. RHEL efforts are already looking at F34.

Whether an additional delay to fix the bug, combined with any prior delays in the cycle, results in the total delay becoming unacceptable in regard to the Fedora Release Life Cycle

I'm tempted to engage in hyperbole, but realistically a two week slip here would not actually be a disaster. However, there's always the risk with the October release that we get things pushed back into the holidays, and then it's suddenly January before you know it. This really happened with F18!

Peter Jones is confident that he can have an update soon, but it's not 100%. There's a non-zero chance that if we slip two weeks, we're still in the same situation just two weeks later with no-gain.

Under what criterion is this being considered?

The basic release criterion All release-blocking images must boot in their supported configurations includes a Supported firmware types note that includes UEFI with Secure Boot enabled. However, in this case Secure Boot is working exactly as intended. The problem arises with a current dbx applied, but stale signed shim. Is that a conditional blocker?

Secure Boot is working as intended from Secure Boot's perspective, but Fedora is not working as intended from Fedora's perspective, because we intend for Fedora to boot. :) That is the criterion that I would use.

This is pretty significant issue and imo this definitely is a conditional blocker. Couple of things to consider though..
1/ How would this issue impact Lenovo shipping Fedora default?
2/ If we are "close" to figuring out a fix, would it not make sense to wait for 2 weeks and ship with the fix? If we don't have the fix by then, we ship anyway after 2 weeks. But at least we tried.

@sdharane Lenovo is already shipping with Secure Boot turned off, as it happens. They didn't like people to have to know to go into the configuration at first boot to disable it on the systems which need the Nvidia proprietary driver. And they wanted it to be consistent across all of the systems, so it's not on on the Intel GPU model either.

@mattdm

On x86, it's "turn off secure boot in the bios".

Please don't do this. Whenever I see this recommendation, I immediately respond with some variation on "this is bad advice, please stop." We need more resources for the Secure Boot paradigm to be effective, not finding ways to unravel it.

The "lie in wait" form of malware, already present on the system and activated by the disabling of Secure Boot, exist. In the worst case, it's a persistent compromise. Clean installs and replacing the drive won't fix it. Re-enabling Secure Boot does not demonstrate your system is secure, however unlikey it is to be pwned, you can't easily prove it. Also, systems come with Secure Boot enabled without a user means of disabling it - you can't assume it can be disabled.

Many people are already running this way, e.g. to enable Nvidia's proprietary driver.

(1) Fedora should be signing it (2) recommend the user sign it (3) making it easier for the user to sign.

@adamwill

but Fedora is not working as intended from Fedora's perspective, because we intend for Fedora to boot.

Have we ever blocked on a single machine demonstrating a problem where we could not reliably reproduce it anywhere else, but have a working theory how it could later snowball?

My gut instinct is this issue is only downhill facing, the spread of the post-BootHole dbx can only pick up pace. Microsoft says they will push it in 2021 which is interpreted to mean Q1 or Q2. Basically it's a gamble: delay now to maybe avoid the possibility of dbx rollout snowballing in the next 3 months which would probably put pressure on us to respin official media. I think respinning is such a huge hassle that we're better off avoiding it at almost any cost, even a 3 week slip, because asking folks to pitch in to reissue official media in the middle of another release cycle sounds terrible.

I'd say the better fallback is to just resign the current 2018 shim binary that we're planning on shipping, i.e don't fix shim just submit the old one for resigning. I think if we ship F33 with old-world keys, we're gonna regret it later. There's a very low regret factor with just signing old-current shim.

We voted not to have this a FESCo blocker at today's meeting, but I still think it would be better to wait until we have this in working order - primarily because of all the external factors we have no control over, especially over the lifetime of the F33 release. Better safe than sorry.

FinalBlocker +1

Also, systems come with Secure Boot enabled without a user means of disabling it - you can't assume it can be disabled.

No x86_64 system with Windows pre-installed should come like this, as Microsoft's OEM requirements for x86_64 systems specifically include that it must be possible to disable Secure Boot.

Systems without Windows pre-loaded, or aarch64 systems with Windows pre-loaded (all, er, what, two of them? Or something?) may have it locked, I guess.

Have we ever blocked on a single machine demonstrating a problem where we could not reliably reproduce it anywhere else, but have a working theory how it could later snowball?

IIRC no, but I don't think that precludes us from doing something like it in this case.

I'd say the better fallback is to just resign the current 2018 shim binary that we're planning on shipping, i.e don't fix shim just submit the old one for resigning.

I'm not sure how possible / difficult this is from a release engineering perspective.

@adamwill (and others on the QA team!), from a QA perspective, how do you feel about the possibility of reissuing updated media (either with all updates rolled in, or with just the updated shim)?

Also, systems come with Secure Boot enabled without a user means of disabling it - you can't assume it can be disabled.

No x86_64 system with Windows pre-installed should come like this, as Microsoft's OEM requirements for x86_64 systems specifically include that it must be possible to disable Secure Boot.

Reportedly, this is not true anymore, at least not for the past couple of years. At least since Windows 10, Microsoft has made it optional to offer a way to disable Secure Boot for desktops.

No x86_64 system with Windows pre-installed should come like this, as Microsoft's OEM requirements for x86_64 systems specifically include that it must be possible to disable Secure Boot.

That is true, however In the WHCP-Systems-Specification-1809 (Sep 2018), and newer revisions, it's (Optional for systems intended to be locked down). That parenthetical isn't present in the 1703 revision.

I don't think there's a clear right or wrong decision, but there are consequences either way, and this makes it a difficult to decide. The more I think about it, the have my cake and eat it too option is to resign current-old shim. It's not the fallback, it's my preferred choice now. It's tested for 2 years, and we don't have to worry about worse than expected dbx proliferation. Whereas a new shim will get what? Days or up to two weeks of testing? It could end up hitting more end users than dbx revocation problems - their relative risks are certainly higher than a resigned current-old shim.

I think we need to ask if it's possible, if it hasn't already been, maybe I overlooked it in the FESCo minutes.

Sad to see Red Hat trying to push for shipping a broken product that you can't even install. I just ran into this bug installing the KDE spin and found this ticket after a lot of googling.

FinalBlocker +1

Sad to see Red Hat trying to push for shipping a broken product that you can't even install. I just ran into this bug installing the KDE spin and found this ticket after a lot of googling.

FinalBlocker +1

Hi Chris,
There is no Red Hat angle here. Check mattdm long response above for reasoning.
<snip> RH hat on -- slight delay here wouldn't affect RHEL in this case. RHEL efforts are already looking at F34.</snip>

@ngompa @chrismurphy thanks for the clarification, I hadn't seen that update.

I may miss some of the go/no-go today as I have to take my cats to the vet. Given that I'm voting here ahead of time, though i reserve the right to change my vote in meeting :). I'm kinda on the fence about this, but overall I think I'm a weak:

FinalBlocker +1

I don't think slipping a week or two is really that terrible; yes, we've been proud of releasing more or less on time the last few cycles, but I don't want to tip over into "release on time at any cost" territory, we have this process and these rules for a reason. We already have people hitting it, however much we think it ought to be rare at this point, and that is only going to trend in one direction - more people are going to hit it over time. Yes we can look at doing a re-spin, but it's a lot of work to do something we've never done before and I'm not convinced it's worth doing that to save the schedule.

We had a period of several years where we slipped a week or two on every release and, you know, the project survived and people liked it and Phoronix made a joke about it and we all got by. I'm okay with that happening for another cycle.

Hell, we can even semi-officially tell people "look, this is the one blocker in RC 1.2, if you want Fedora 33, grab an image and see if it boots, if it does, away you go".

adamw's thoughts reflect exactly mine. I'm also a weak:

FinalBlocker +1

I gave my detailed thoughts above so I'll try to not repeat too much, but for me what tips it over is:

Fedora 32 media that's out there right now faces the same problem. If we delay, the status overall will be "Some systems can't boot our official supported release media and will need a work-around."

If we ship F33, the status will be word-for-word identical. So I don't see a gain in not shipping given that it's the only known blocker. We don't even have complete confidence that a delay will be only two weeks.

Maybe the situation will be worse in February, but the move by Canonical to retract the update for now makes that less likely. If it does become an issue, we can provide updated media as soon as that becomes a possibility.

Fedora 32 media that's out there right now faces the same problem. If we delay, the status overall will be "Some systems can't boot our official supported release media and will need a work-around."

Sure, that's true now, but if we consider what happens in 7 months from now, the situation is completely different. Let's say Ubuntu and Windows push out the revocation in 7 months, which is incidentally the same date that F32 goes EOL. At that point, if we have F33 fixed, then all supported Fedora releases are going to be okay. But if we today decide to not wait for updated/re-signed shim, we are back at square one in 7 months with having to somehow fix F33 media retroactively.

Well, in 7 months, we can tell people to install F34. But we can also have updated F33 ISOs.

Kevin Fenzi says he's confident that we can respin F33 media if we need to, and I'll weight his comment on that pretty heavily. We've never done it on quite this scale before, but we also shouldn't let that scare us: it's not like building updated media is really a new problem.

Metadata Update from @blockerbot:
- Issue status updated to: Closed (was: Open)

4 years ago

Release F33 is no longer tracked by BlockerBugs, closing this ticket.

Log in to comment on this ticket.

Metadata