#1922 Support for i686 on critical path packages
Closed: Fixed a year ago Opened a year ago by labbott.

On Jun 14, I disabled i686 builds on the kernel due to a segfault. I blocked the tracker bug and e-mailed the x86 list. It turned out, other packages started failing due to the lack of new i686 builds.
This was discussed on devel@ and eventually someone gave a patch to fix the segfault.

I think this indicates there are large gaps with i686 support. The issue only got support because someone had to make a big enough deal to get it fixed. Advocating for bugs to get fixed is work, and I'd like Fesco to help guide this question.

  • What is a reasonable time frame for someone to the i686 sig to respond to bugs on critical path bugs
  • Who should be responsible for driving these bugs to completion
  • Does the i686 SIG have enough resources to support this task

My $0.02

What is a reasonable time frame for someone to the i686 sig to respond to bugs on critical path bugs

Respond is different from fix, so I'm going to go with 1 week for a response. Honestly, I'd like to say less but that's unrealistic.

Who should be responsible for driving these bugs to completion

The i686 SIG, which is why that SIG was created to begin with.

Does the i686 SIG have enough resources to support this task

We should probably ask the i686 SIG directly. @jsbackus @arisunz @athoscr @smooge

What is a reasonable time frame for someone to the i686 sig to respond to bugs on critical path bugs

Respond is different from fix, so I'm going to go with 1 week for a response. Honestly, I'd like to say less but that's unrealistic.

That's a bit unfortunate for critical path packages. As demonstrated with the kernel that means an entire week could go by with multiple packages blocked, and that's just for a response. Does this meet with the targets the x86 SIG has set? (I can't actually find the most up-to date details here)

This issue has been triggered by a very unusual situation: that the kernel wouldn't compile. Also, the (final?) analysis is that it was not i686 specific, it just showed up there. The normal situation is that the kernel builds, but doesn't work for some reason. In that case, the only ones affected are the i686 users and they can take as long as they want to fix it since it's only their problem. I'm not a member of the SIG and I no longer have any 32-bit systems running Fedora, but I do think that a kernel build-system problem should be investigated by the kernel team as it is unlikely to be arch-specific.

The subject here says "critical path packages" and I think the same consideration applies to those as well. If gcc fails to build on i686, I would expect that to be investigated by the maintainer and/or upstream developers. If gcc builds but fails to run on i686, then it could possibly be left up to the i686 users to solve or bring to the attention of upstream. Again, it's not affecting the other arches. As for critical path libraries, because of multi-arch I would expect them to get more attention from everyone.

Edit: When I said the kernel not compiling was unusual, I meant that it was unusual that it only didn't compile on one arch.

Here are the original targets the x86 SIG set. Regardless of whether or not the SIG is meeting those targets, I'm not sure the impact of turning off the i686 kernel was fully understood at the time these targets were set and approved by FESCo so perhaps those should be revisited.

This issue has been triggered by a very unusual situation: that the kernel wouldn't compile. Also, the (final?) analysis is that it was not i686 specific, it just showed up there. The normal situation is that the kernel builds, but doesn't work for some reason. In that case, the only ones affected are the i686 users and they can take as long as they want to fix it since it's only their problem. I'm not a member of the SIG and I no longer have any 32-bit systems running Fedora, but I do think that a kernel build-system problem should be investigated by the kernel team as it is unlikely to be arch-specific.

We as kernel maintainers have been saying that i686 needs to be community supported for a long time now. For the most part, this is what we do, any i686 bugs get tagged to the tracker for the i686 SIG to deal with. And yes, while the final analysis is that it wasn't i686 specific, the key word there is analysis. Analysis and debugging takes time and it was impossible to know whether or not the actual issue was i686 specific or not until it was actually debugged, especially given the failure was only happening on i686.

The subject here says "critical path packages" and I think the same consideration applies to those as well. If gcc fails to build on i686, I would expect that to be investigated by the maintainer and/or upstream developers. If gcc builds but fails to run on i686, then it could possibly be left up to the i686 users to solve or bring to the attention of upstream. Again, it's not affecting the other arches. As for critical path libraries, because of multi-arch I would expect them to get more attention from everyone.

According to https://fedoraproject.org/wiki/Architectures/x86, architecture specific issues are supposed to be driven by the architecture team. I'm not saying the kernel or other critical path maintainers have no responsibility. What I am looking for here is for an answer about taking ownership of problems and driving them to completion. Setting direction and debugging take effort and should not be under-estimated. I am happy to answer questions about how the kernel works since I know it can be a daunting package but I do not want to be the one actually pushing for i686 bugs to get fixed.

And yes, while the final analysis is that it wasn't i686 specific, the key word there is analysis. Analysis and debugging takes time and it was impossible to know whether or not the actual issue was i686 specific or not until it was actually debugged, especially given the failure was only happening on i686.

My point on that was that I think a build-system error should be investigated by the kernel team regardless of architecture as it is unlikely to be architecture-specific and they are also quite rare. Any other kernel problem is unlikely to cause the situation that led to this issue being raised.

From my POV as someone whose packages were impacted by the kernel being disabled on i686, I'd like to see guidance from FESCO on when it is acceptable to turn off an architecture in packages that are critical path, such as the kernel. Turning off a package on an architecture has a potentially very significant impact on downstream packages, rippling out to affect many maintainers, blocking their ongoing work in Fedora.

NB I'm probably not using the official definition of "critical path" here - I'm really meaning to imply any package which is likely to be a build or runtime pre-requisite of a non-trivial number of other packages. ie anything likely to have a big impact on other maintainers when broken/missing.

My feeling is that the i686 kernel build should not have been disabled immediately, but rather there should have been a period of time allowed to identify & resolve the problems, before taking the big hammer approach of disabling the arch & impacting downstream packages.

My apologies - been out of the loop. I'll try to get caught up and respond
this afternoon.

From my POV as someone whose packages were impacted by the kernel being disabled on i686, I'd like to see guidance from FESCO on when it is acceptable to turn off an architecture in packages that are critical path, such as the kernel. Turning off a package on an architecture has a potentially very significant impact on downstream packages, rippling out to affect many maintainers, blocking their ongoing work in Fedora.
NB I'm probably not using the official definition of "critical path" here - I'm really meaning to imply any package which is likely to be a build or runtime pre-requisite of a non-trivial number of other packages. ie anything likely to have a big impact on other maintainers when broken/missing.
My feeling is that the i686 kernel build should not have been disabled immediately, but rather there should have been a period of time allowed to identify & resolve the problems, before taking the big hammer approach of disabling the arch & impacting downstream packages.

This was an agreement made with the i686 SIG when it was founded; the kernel team does not have resources to maintain i686, period. The i686 SIG exists for this purpose. They agreed that the kernel build would disable i686 builds any time they were breaking the general kernel compose until the i686 SIG could figure it out. It was perhaps not fully understood how much domino effect this would have, but it's all functioning as intended.

Part of the purpose of this is to ensure that the i686 SIG is actually functional; if situations like this continue to arise and are not dealt with in a timely manner, i686 will simply end up dropped completely from Fedora. It's on community-provided life-support right now.

What about adding a separate kernel-i686-compat package, owned by i686-sig, that would have ExclusiveArch:i686 and would carry a possibly older kernel version for i686 use, and would provide the same binary packages (kernel-core, kernel-headers, kernel, …) ?

This would allow the kernel team to move forward with any updates without waiting for i686, and would allow the i686-sig to update the kernel e.g. only to the stable versions. Of course this would significantly decrease the response rate for any security or other issues on that architecture. But this might be OK, because the i686 kernel is not used that much: those i686 users which we don't know how many there are, and a bunch of packages which use it mostly for testing. This would be a less drastic measure than dropping the i686 kernel completely, and would allow us to mostly keep status quo, without any changes in packages that depend on the kernel.

BTW, the kernel (not kernel-headers) is BR by libguestfs, supermin, qemu, qemu-sanity-check only, and Required by libguestfs, libkcapi, mod_selinux, netlabel_tools, qemu-sanity-check, rpm-ostree-toolbox, vdsm. I assume at least some of the second list is bogus, since requiring the kernel is usually meaningless, because it is a multi-install package so there's no way to force a specific version.

Hi folks,

First of all, I apologize for the firestorm.

Laura, thank you for contacting the SIG and offering to help us resolve the issue. I apologize that it took a large stoppage to get to a resolution. It looks like @alexpl responded within about a day to your initial e-mail. I agree that we haven't been nearly responsive enough to bugs filed, but I hope that you will at least feel like you can contact the list and get a response. Additionally, it looks like @alexpl was able to work with upstream to get the patch in place. You raise a very valid concern re: driving response to critical issues. I propose for future critical issues please feel free to ping me directly in addition to the list. When I don't have time to debug I can at least help coordinate. I will work on staying up to date with the list. I will touch base with @adamwill and his team, as well.

With regard to what to do about the i686, my preference is to not make any changes, but I think @zbyszek's proposal is a good alternative to complete removal. Am I correct in believing that it would allow us to continue to build i686 spins? Regardless of the choice, I would allow plenty of time for (technical) fallout if we decide to make a change, as indicated by the expected problems this change caused.

Hi folks,
First of all, I apologize for the firestorm.
Laura, thank you for contacting the SIG and offering to help us resolve the issue. I apologize that it took a large stoppage to get to a resolution. It looks like @alexpl responded within about a day to your initial e-mail. I agree that we haven't been nearly responsive enough to bugs filed, but I hope that you will at least feel like you can contact the list and get a response. Additionally, it looks like @alexpl was able to work with upstream to get the patch in place. You raise a very valid concern re: driving response to critical issues. I propose for future critical issues please feel free to ping me directly in addition to the list. When I don't have time to debug I can at least help coordinate. I will work on staying up to date with the list. I will touch base with @adamwill and his team, as well.

Thanks for the response. I still think we need an answer to the initial question about how long we should both wait for a response and a bug fix though. Given maintainers of packages who depend on the kernel have expressed an objection, I don't think we can just let the build break.

With regard to what to do about the i686, my preference is to not make any changes, but I think @zbyszek's proposal is a good alternative to complete removal. Am I correct in believing that it would allow us to continue to build i686 spins? Regardless of the choice, I would allow plenty of time for (technical) fallout if we decide to make a change, as indicated by the expected problems this change caused.

I'm wary of this proposal. It seems likely to end up with an out of date kernel and I don't think we should be encouraging use of software that's known to be out of date or potentially buggy.

What is a reasonable time frame for someone to the i686 sig to respond to bugs on critical path bugs
Respond is different from fix, so I'm going to go with 1 week for a response. Honestly, I'd like to say less but that's unrealistic.

That's a bit unfortunate for critical path packages. As demonstrated with the kernel that means an entire week could go by with multiple packages blocked, and that's just for a response. Does this meet with the targets the x86 SIG has set? (I can't actually find the most up-to date details here)

Perhaps you meant to ask a different question. I have a feeling what you were trying to ask is a response time on fixing build bugs for critpath packages. That's different from general bugs. Extremely respectfully, all of the critpath packages have a collection of bugs that have no response at all in bugzilla.

So if we go with response time on fixing build bugs for critpath packages, I'd refine my timeframe to 2 days to respond.

Perhaps you meant to ask a different question. I have a feeling what you were trying to ask is a response time on fixing build bugs for critpath packages. That's different from general bugs. Extremely respectfully, all of the critpath packages have a collection of bugs that have no response at all in bugzilla.
So if we go with response time on fixing build bugs for critpath packages, I'd refine my timeframe to 2 days to respond.

Yes, thanks for making that clarification. My intention was to ask about build bugs since those prevent anything from going out.

Maybe we should just change our tooling (Koji, mostly) so that we automatically ship, on each architecture, the latest version of the package that built on that particular architecture? I.e., if a package builds on some architectures and not others, it would be automatically updated to the new version on the architectures where it built successfully and kept on the latest successful build on those where it failed. Debian has been successfully using this model for years. And this is also how Copr works. (In fact, in Copr, the package ends up in the repository for an architecture as soon as it built on that architecture, even if the others are still building, and even if they failed.)

I'm wary of this proposal. It seems likely to end up with an out of date kernel and I don't think we should be encouraging use of software that's known to be out of date or potentially buggy.

What about some of the longterm kernels, that one would not go out of date that quickly.

Thanks for the response. I still think we need an answer to the initial question about how long we should both wait for a response and a bug fix though. Given maintainers of packages who depend on the kernel have expressed an objection, I don't think we can just let the build break.

Yes, I think that is fair.

I'm wary of this proposal. It seems likely to end up with an out of date kernel and I don't think we should be encouraging use of software that's known to be out of date or potentially buggy.

Yes, there are definite downsides. It would be an easier argument if we maintained an LTS-type kernel package in parallel with the frequently updated one. Given that the kernel SIG is bandwidth-constrained maintaining one kernel, I don't think we want to add another. :smiley:

Maybe we should just change our tooling (Koji, mostly) so that we automatically ship, on each architecture, the latest version of the package that built on that particular architecture? I.e., if a package builds on some architectures and not others, it would be automatically updated to the new version on the architectures where it built successfully and kept on the latest successful build on those where it failed. Debian has been successfully using this model for years. And this is also how Copr works. (In fact, in Copr, the package ends up in the repository for an architecture as soon as it built on that architecture, even if the others are still building, and even if they failed.)

+1

Maybe we should just change our tooling (Koji, mostly) so that we automatically ship, on each architecture, the latest version of the package that built on that particular architecture? I.e., if a package builds on some architectures and not others, it would be automatically updated to the new version on the architectures where it built successfully and kept on the latest successful build on those where it failed. Debian has been successfully using this model for years. And this is also how Copr works. (In fact, in Copr, the package ends up in the repository for an architecture as soon as it built on that architecture, even if the others are still building, and even if they failed.)

Then we would also need different updates in Bodhi for the different pkgs and in fact would fork Fedora for all releases because a NVR in one arch could be built against completely different dependencies on another arch. Not sure that Debian does things better in this regard, AFAIK there the binaries are just built on the developers systems against an unrecorded package set which leads to other issues. IMHO the central build system in Fedora is superior.

I guess a compromise would be to allow to ship only a partial package set (e.g. the newer kernel on x86_64 but the older on i686) but allowing this as build dependencies does not feel right.

I'm wary of this proposal. It seems likely to end up with an out of date kernel and I don't think we should be encouraging use of software that's known to be out of date or potentially buggy.

When we disable new kernels for i686, the users will also not get any update. So the only advantage of disabling the kernel is that it increases the pressure for the i686 SIG to come up with a solution faster.

I'm wary of this proposal. It seems likely to end up with an out of date kernel and I don't think we should be encouraging use of software that's known to be out of date or potentially buggy.

When we disable new kernels for i686, the users will also not get any update. So the only advantage of disabling the kernel is that it increases the pressure for the i686 SIG to come up with a solution faster.

I can't tell if you are agreeing or disagreeing here but my issue with maintaining a separate kernel (whether it's the same version or a LTS as proposed above) is that it has to be done for the life of i686. Probably 99% of the time we can build an i686 kernel with the rest of the arches and get all the updates/fixes just fine. It's that 1% which is the problem. The intention is for disabling on i686 to be a temporary workaround so that any delay in missing fixes is temporary and will catch up with the next update when the issue is fixed. There are already concerns about over-burdening the i686 SIG and I don't think trying to have them take on a separate kernel will make that problem any better.

This was an agreement made with the i686 SIG when it was founded; the kernel team does not have resources to maintain i686, period. The i686 SIG exists for this purpose. They agreed that the kernel build would disable i686 builds any time they were breaking the general kernel compose until the i686 SIG could figure it out. It was perhaps not fully understood how much domino effect this would have, but it's all functioning as intended.

It would be nice if there were a way that it didn't break other packages, but going a week + without a kernel build, particularly in the merge window makes it significantly harder to track down issues when they arise. Had we refused to update the kernel package for x86_64/arm/etc, it makes it more difficult for users to "koji bisect" and instead of having dozens of commits to analyze, we have hundreds or thousands.

Part of the purpose of this is to ensure that the i686 SIG is actually functional; if situations like this continue to arise and are not dealt with in a timely manner, i686 will simply end up dropped completely from Fedora. It's on community-provided life-support right now.

While this is a great goal, it build issues are pretty rare. This is the first time it has been disabled since the SIG was established. In the meantime, things like a complete failure to boot on targeted hardware (PIII) for F28 were shipped and never resolved last I saw. There were complaints on the list, but no attempts to resolve that I have seen. Again, this is a self fixing problem because the PIII will no longer be supported with F29 due to build flags changing, but serves as an interesting gauge of how functional the SIG is at a given time.

Various approaches have been proposed, but they would all require work to implement, and it seems that @jsbackus and others want to pick up the slack.

PROPOSAL: If a kernel build issue on i686 reoccurs, kernel maintainers must notify i686 sig and give them two days to handle the issue.

If this turns out to be reoccurring problem, we can revisit the policy.

Various approaches have been proposed, but they would all require work to implement, and it seems that @jsbackus and others want to pick up the slack.
PROPOSAL: If a kernel build issue on i686 reoccurs, kernel maintainers must notify i686 sig and give them two days to handle the issue.
If this turns out to be reoccurring problem, we can revisit the policy.

Are those two "weekdays" or are we counting weekends? I think I'd prefer to give a little more time to the i686 SIG in general (3-4 days?) but I also would want to have an addendum that says that if the kernel is being rebuilt for a moderate or higher security vulnerability, discretion is left up to the Fedora Kernel Team whether it is better in that instance to temporarily ExcludeArch i686 in order to get the fix out to the other arches.

Would it be possible to flip a switch and build only the kernel-headers package for i686 in case problems occur? Then this could be done more aggressively IMHO.

We were actually just discussing the possibility of breaking kernel-headers out into a separate package (similar to kernel-tools) for unrelated reasons on Tuesday. If we were to do that, kernel-headers would always be built.

This conversation has somewhat petered out. I haven't seen a new proposal, but I'm -1 to the one in https://pagure.io/fesco/issue/1922#comment-519920 as written (see my response above).

I guess we should try to hammer out a policy in a meeting.

Metadata Update from @zbyszek:
- Issue tagged with: meeting

a year ago

@jforbes what is the status about the extra kernel-headers package?

The kernel-headers spec should be posted for review soon, I expect the split will actually happen in rawhide in the next week or 2, and will trickle down to F27/28 as they are rebased to 4.18.

We will discuss this in today's meeting, which starts in about an hour in #fedora-meeting-1.

AGREED: let's see how the split kernel-headers package does and let's notify the x86 SIG that they are at risk of losing i686 if more events like this occur (+5, 0, -0) (bowlofeggs, 15:24:09)

Metadata Update from @bowlofeggs:
- Issue untagged with: meeting
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata