#1565 Fedora schedule must continue to track core library ABI or risk serious ABI implications.
Closed None Opened 6 years ago by codonell.

Please do not move the Fedora 25 schedule any earlier.

Please keep the Fedora schedules roughly 6 months appart tracking the core runtime ABI.

The new Fedora 25 schedule is ambitiously early, and in doing moves dangerously close to the edge of an ABI precipice. Any earlier and Fedora Rawhide for glibc will have to stop tracking master, placing Fedora release 6 months behind every other distribution that is tracking the upstream glibc ABI. Worse, if such decisions are made late, it risks mass rebuilds, ABI reversals, and corrective action to backout a core runtime ABI that is no longer guaranteed.

Why?

Let me explain.

It was not a historical accident that Fedora releases on 6-month schedules, and glibc releases on 6-month schedules. The schedules were coordinated together.

The idea was that the core C library would release on or just after the Fedora Alpha branch. This way Fedora (arguably the most important Linux distribution) would need only a minor update and lock in the ABI guarantees given by the glibc update.

The glibc releases look like this:

February 1st -> glibc 2.23 released.
August 1st -> glibc 2.24 released.
February 1st -> glibc 2.25 released.

... and so on, cycles of 5 months of development, 1 month of hardening freeze, and a release.

At the release point the ABI is locked in and guarantees are provided by the upstream community that if the distribution picks up the release they will be guaranteed a stable backwards-compatible ABI.

As of today the glibc rawhide branch tracks upstream glibc master (continuous integration) and is only ABI stable at the points around Fedora releases (responsibility of the Fedora glibc team). Having rawhide track upstream master is immensely useful for glibc (even if it might someday mean a rawhide mass rebuild if we have a serious ABI issue), it is an early indicator or problems, catches toolchain integration issues, and lets us resolve issues early and promptly, resulting in less problems downstream for Fedora. It's CI through and through.

If Fedora releases branch from rawhide much earlier than a glibc upstream release they will have branched in the middle of an unstable glibc ABI with no guarantees from the upstream community for a backwards-compatible ABI. This is a serious issue and one that could result in incompatibilities between Fedora and every other distribution that could break ISV binaries. In essence Fedora would have a hybrid ABI somewhere in the middle of a glibc release.

If Fedora release branch from rawhide much later than a glibc upstream release this means rawhide must be frozen at the previous stable ABI (requires apriori knowledge of the delay) and ruins the CI efforts of Rawhide which piles up complex ABI related testing for an already late release.

This dire picture would never happen, the Fedora glibc team actively tracks these kinds of ABI issues and would take corrective action.

The Fedora glibc team has two kinds of corrective action to take:

(a) Stop synchronizing glibc rawhide with glibc master
Pro: Rawhide ABI is stable all the time.
Cons: Breakage is only ever detected when we update for releases, causing a mad scramble at release time (continuous integration failure).

(b) Step in to change the upstream glibc master schedule or Fedora release schedule.
Pro: Rawhide tracks upstream ABI and we get continuous integration.
Cons: Requires everyone to pay attention to the schedule details.

We have opted for (b) for Fedora 25, but the schedule is just lined up with the glibc 2.24 schedule. Any slippage in the glibc schedule spells problems for Fedora 25 ABI guarantees. Thankfully upstream glibc is committed to time boxed releases that the distributions can count upon, so I expect no pushback from this public request to keep the schedule firm:
https://www.sourceware.org/ml/libc-alpha/2016-03/msg00766.html

In closing: Fedora schedule must continue to track core library ABI or risk serious ABI implications.

Please consider this a serious warning from the Fedora glibc team.


CCing jkurik who maintains the Fedora Project schedules.

Hi Carlos,

A few questions for you.

How does the glibc schedule compare to that of gcc? We'd need to take both into account to really address the concerns you're expressing here.

Gnome similarly has asked for a 6mo schedule. I'm sure there are many other projects that have a 6mo upstream cadence, but they might not line up with Fedora's 6mo releases. The problems are not limited to glibc or gcc or any other package, and syncing with one may well cause issues with N others.

Further, I fully understand the value of using Fedora releases as the basis of upstream CI but that seems short-sighted from an upstream point of view. Essentially you are saying that upstream glibc is only testable by Fedora rawhide users (and rebuilds). That is a fairly small number of testers. Tying the upstream project's success to rawhide testing seems fragile. I would think a broader investment in CI/CT would be desired.

In a theoretical world where we have varying release schedules, either through Modularity where a Base module that contains glibc is the foundation other Fedora modules are built on, or through Editions wanting different lifecycles, how would the glibc team cope? The simple answer is to fix on the older ABI as you have said above, but that doesn't address the testing aspects of the new upstream releases.

Thanks Carlos for the extensive explanation. It is very helpful for me.

To let me understand the needs of glibc team, do you have any suggestion/expectation, based on the experience with previous glibc releases, how much contingency is save to have, from your POV, between glibc release and Fedora branching from rawhide ?

As Josh mentioned above, glibc is not the only project we need to synchronize with. However, we might try to be as cooperative here as possible, so I would like to understand what is the best fit for glibc team.

Carlos, my goal is for Fedora to target early May, late October every time. That's a (roughly) six-month target. But, to keep to that, given that we sometimes have delays, it means that we need to have correspondingly-shorter cycles following as a corrective measure.

If we don't do that, the natural tendency is for releases to average 7-8 months and cycle around the calendar in a way that is chaotically unpredictably more than two releases out.

The only other alternative would be to switch from our hybrid time / feature-quality model to one that's strictly time-based. Given all of the unpredictable upstream software we integrate, let alone our own changes, it's really, really hard to do "whoops, that feature isn't ready; roll it back", which means that this would definitely result in a gigantic drop in release quality and stability.

Maybe in the future with Modularity, we can do a hard release deadline more easily. But I don't even know what that would look like at this point.

Replying to [comment:3 jwboyer]:

How does the glibc schedule compare to that of gcc? We'd need to take both into account to really address the concerns you're expressing here.

Yes, we need to consider the gcc schedule. This particular ticket was limited in scope to glibc, which is the first of the core libraries that constrains the Fedora schedule in some ways.

I spoke with Jakub Jelinek (core gcc developer and fedora gcc maintainer) and the basic point is that gcc follows path (a) as noted in my original post. The rawhide gcc branch only starts to track gcc trunk (stage3 and stage4) when gcc is approaching a release that is scheduled to coincide with a gcc release and Fedora release. So the gcc team looses some of the CI aspects of fedora rawhide (stage1 and stage2 gcc development) to ensure that the rawhide compiler is stable for a stable Fedora release. The gcc release model is annually around March/April, and two Fedora releases will have the same compiler. The annual release model means gcc rawhide is not updated from gcc trunk in a CI fashion (for part of the year) because it would cause the second annual fedora release to be based on a pre-release compiler which is not a supportable plan. Instead the fedora gcc team does package builds outside of the fedora rawhide process to test the compiler.

See Marek Polacek's blog for an example:
http://developers.redhat.com/blog/2016/03/10/testing-gcc-in-the-wild/

The models we use for gcc and glibc are different.

Gnome similarly has asked for a 6mo schedule. I'm sure there are many other projects that have a 6mo upstream cadence, but they might not line up with Fedora's 6mo releases. The problems are not limited to glibc or gcc or any other package, and syncing with one may well cause issues with N others.

You are expanding the scope of this ticket, and if you are OK with that, then I'm happy to answer some of these questions, but I am only authoritative when it comes to glibc.

The problems are not limited to glibc or gcc, but glibc and gcc form the core runtime of Fedora, and if you get that wrong you will have ABI issues across all of your applications (since libc and libgcc are part of all applications modulo some runtimes like self-contained small Go apps).

For Gnome, depending on their rawhide model (does rawhide track unstable development or not) and upstream ABI assurance model, you have the same problems, but it only impacts applications using those libraries (not everything). The Gnome developers also have to make choices about how they update rawhide, again based on their own development models.

All packages have the same problem.

The glibc is not unique, but we are the lowest level runtime and the Fedora glibc team aims for maximum CI investment to ensure highest quality releases.

Further, I fully understand the value of using Fedora releases as the basis of upstream CI but that seems short-sighted from an upstream point of view. Essentially you are saying that upstream glibc is only testable by Fedora rawhide users (and rebuilds). That is a fairly small number of testers. Tying the upstream project's success to rawhide testing seems fragile. I would think a broader investment in CI/CT would be desired.

Please let me clarify.

There are CI efforts in Fedora, and upstream glibc.

Upstream has it's own CI build bot which ensures glibc can be built and regression tested on as many architectures as possible. The Fedora glibc team is involved with those efforts.

Fedora Rawhide is the CI for Fedora's glibc with Fedora patches applied. Fedora Rawhide glibc is being used to measure the impact of upstream changes on Fedora's own packages to mitigate risk in Fedora. This is Fedora's responsibility. This is how the glibc team maximizes our CI investment through Fedora Rawhide.

The upstream project's success is not tied to rawhide testing.

Fedora's success is tied to rawhide testing of new upstream versions in a timely fashion to address Fedora-related issues with upstream.

In a theoretical world where we have varying release schedules, either through Modularity where a Base module that contains glibc is the foundation other Fedora modules are built on, or through Editions wanting different lifecycles, how would the glibc team cope? The simple answer is to fix on the older ABI as you have said above, but that doesn't address the testing aspects of the new upstream releases.

The core runtimes and toolchain are at the bottom of the entire software stack, along with the kernel.

The lowest level module must have a release schedule that takes into consideration the schedule of the software components that go into that module.

If such a module has CI in the way of rawhide, the branching from rawhide should be coordinated with software components that go into that module.

If we do not want to coordinate schedules, then Rawhide ceases to be useful for integration and CI, and instead becomes "the most recent stable version" with large jumps between stable versions causing large integration work at times of update. Something which I would like to avoid at all costs for something as complex as ABI assurances. Fedora would loose out on Fedora CI testing in Rawhide.

Hopefully that answers your questions.

Replying to [comment:4 jkurik]:

To let me understand the needs of glibc team, do you have any suggestion/expectation, based on the experience with previous glibc releases, how much contingency is save to have, from your POV, between glibc release and Fedora branching from rawhide ?

When the glibc release is made the glibc team needs to make a decision:

  • Should glibc rawhide stay at the stable release or track master?

So every February and April we have to ask ourselves that question.

The answer to that question is (assume it is February):

  • If Fedora Rawhide branches in August/September then rawhide can track from master. We gain valuable Fedora CI.

  • If Fedora Rawhide branches earlier than August then we must stay on a stable release to avoid ABI risk in the coming Fedora release. We loose valuable Fedora CI and risk complex integration of core ABI before the release.

The F25 branching from Rawhide is July 26th, but for today's discussion we consider it "August."

The Fedora glibc team does not want to change any core ABI after Alpha Freeze.

Therefore the Alpha Freeze must not move earlier than August 1st (timeboxed glibc release date).

Keep in mind that as glibc rawhide is tracking master we gain in confidence in a linear fashion for every day we test other package builds and as we approach the F25 branch we are ready to make the release with just a minor amount of synchronization. Usually the process is like this: Fxx is updated to the released version before Alpha Freeze and rebuilt. This might mean that glibc rawhide has to be held frozen on the stable version for a short period of time. The shorter that period the better for core runtime testing.

As Josh mentioned above, glibc is not the only project we need to synchronize with. However, we might try to be as cooperative here as possible, so I would like to understand what is the best fit for glibc team.

Yes, glibc is not the only project, but it is the only project against which every process in the GNU/Linux runtime is linked against. Similarly I'd say the compiler and kernel all need special treatment when it comes to scheduling the release.

This FESCo ticket was just a reminder and a refresher that we all need to keep an eye on the schedules :-)

Replying to [comment:5 mattdm]:

Carlos, my goal is for Fedora to target early May, late October every time. That's a (roughly) six-month target. But, to keep to that, given that we sometimes have delays, it means that we need to have correspondingly-shorter cycles following as a corrective measure.

That is a perfect schedule for synchronizing with core ABI changes in glibc.

I say perfect because an early May GA means a February branching from rawhide which coincides with the glibc release in Febraury being ahead of the Alpha Freeze.

This precisely optimizes Fedora rawhide CI for core ABI changes.

Replying to [comment:7 codonell]:

Replying to [comment:3 jwboyer]:

Gnome similarly has asked for a 6mo schedule. I'm sure there are many other projects that have a 6mo upstream cadence, but they might not line up with Fedora's 6mo releases. The problems are not limited to glibc or gcc or any other package, and syncing with one may well cause issues with N others.

You are expanding the scope of this ticket, and if you are OK with that, then I'm happy to answer some of these questions, but I am only authoritative when it comes to glibc.

I'm not. I'm using other packages as an example of how fixating the Fedora schedule for any specific package can be troublesome.

The problems are not limited to glibc or gcc, but glibc and gcc form the core runtime of Fedora, and if you get that wrong you will have ABI issues across all of your applications (since libc and libgcc are part of all applications modulo some runtimes like self-contained small Go apps).

For Gnome, depending on their rawhide model (does rawhide track unstable development or not) and upstream ABI assurance model, you have the same problems, but it only impacts applications using those libraries (not everything). The Gnome developers also have to make choices about how they update rawhide, again based on their own development models.

All packages have the same problem.

OK, so we agree on my point.

The glibc is not unique, but we are the lowest level runtime and the Fedora glibc team aims for maximum CI investment to ensure highest quality releases.

And here is where we need to make a decision as a distro. Is glibc (and gcc though less of a schedule concern) important enough to dictate the schedule for the entire distro, possibly causing much less CI/testing for other packages? In a traditional Fedora development model, I'd possibly argue "yes". But see below.

Further, I fully understand the value of using Fedora releases as the basis of upstream CI but that seems short-sighted from an upstream point of view. Essentially you are saying that upstream glibc is only testable by Fedora rawhide users (and rebuilds). That is a fairly small number of testers. Tying the upstream project's success to rawhide testing seems fragile. I would think a broader investment in CI/CT would be desired.

Please let me clarify.

There are CI efforts in Fedora, and upstream glibc.

Upstream has it's own CI build bot which ensures glibc can be built and regression tested on as many architectures as possible. The Fedora glibc team is involved with those efforts.

Fedora Rawhide is the CI for Fedora's glibc with Fedora patches applied. Fedora Rawhide glibc is being used to measure the impact of upstream changes on Fedora's own packages to mitigate risk in Fedora. This is Fedora's responsibility. This is how the glibc team maximizes our CI investment through Fedora Rawhide.

The upstream project's success is not tied to rawhide testing.

Fedora's success is tied to rawhide testing of new upstream versions in a timely fashion to address Fedora-related issues with upstream.

Thank you for the clarification.

In a theoretical world where we have varying release schedules, either through Modularity where a Base module that contains glibc is the foundation other Fedora modules are built on, or through Editions wanting different lifecycles, how would the glibc team cope? The simple answer is to fix on the older ABI as you have said above, but that doesn't address the testing aspects of the new upstream releases.

The core runtimes and toolchain are at the bottom of the entire software stack, along with the kernel.

The kernel rebases constantly in Fedora, as it is one of the few packages that can leverage parallel installation. It is not the bottom of the stack, nor the top of the stack. It's a special snowflake and best left out of any of these conversations because it breaks all the rules.

The lowest level module must have a release schedule that takes into consideration the schedule of the software components that go into that module.

Correct and agreed.

If such a module has CI in the way of rawhide, the branching from rawhide should be coordinated with software components that go into that module.

If we do not want to coordinate schedules, then Rawhide ceases to be useful for integration and CI, and instead becomes "the most recent stable version" with large jumps between stable versions causing large integration work at times of update. Something which I would like to avoid at all costs for something as complex as ABI assurances. Fedora would loose out on Fedora CI testing in Rawhide.

That somewhat answers my question, but I want to push on this a bit further to clarify. The Base module (assuming it includes glibc), may have versioning that means other modules can depend on specific Bases or any Base. E.g. an httpd module may just need the equivalent of "Requires: Base" but an advanced application may need "Requires: Base > 2.0" to get some specific ABI/API only found in newer versions.

In such a world, you naturally have N number of Base modules in play and a rawhide like model may exist for packages within each module, but the combinations of said modules cannot be done in lockstep. Then end result is that CI will need to grow to accommodate testing of all provided modules against updated requirements of that module. That won't look anything like what we have today with rawhide, but it may actually provide better coverage in the long run. With such a model, you have CI testing for new+new (somewhat rawhide like) and testing of ABI guarantees against new+old. That does, however, come with the expense of jumps between versions though.

I bring this up because it seems a strong possibility with Modularity that all package owners and teams will need to take into account in the future. Depending on a rawhide model or fixed release schedule to assure package interactions are valid seems impossible in such a development model.

Replying to [comment:8 codonell]:

As Josh mentioned above, glibc is not the only project we need to synchronize with. However, we might try to be as cooperative here as possible, so I would like to understand what is the best fit for glibc team.

Yes, glibc is not the only project, but it is the only project against which every process in the GNU/Linux runtime is linked against. Similarly I'd say the compiler and kernel all need special treatment when it comes to scheduling the release.

The kernel doesn't need special treatment, but I appreciate the concern.

This FESCo ticket was just a reminder and a refresher that we all need to keep an eye on the schedules :-)

Yes, absolutely. In today's release model, this is certainly valid. However, I'm also trying to point out that the traditional idea of everything being released as a whole might not be viable in the long run when taking Modularity into account. So consider it food for thought in a world where a "Fedora release" doesn't necessarily mean "all artifacts were built together in lockstep."

Replying to [comment:10 jwboyer]:

In such a world, you naturally have N number of Base modules in play and a rawhide like model may exist for packages within each module, but the combinations of said modules cannot be done in lockstep. Then end result is that CI will need to grow to accommodate testing of all provided modules against updated requirements of that module. That won't look anything like what we have today with rawhide, but it may actually provide better coverage in the long run. With such a model, you have CI testing for new+new (somewhat rawhide like) and testing of ABI guarantees against new+old. That does, however, come with the expense of jumps between versions though.

I need not come at the cost of jumps between versions.

I would argue that you need to keep building "unstable" modules and testing them continuously with other "unstable" modules to see how things break.

At some point you promote the unstable module to a stable module and begin using it in production. That transition is going have the very same problems we have today when going from Rawhide to F25.

In many ways Debian pioneered this with unstable/testing/stable, but never took the next step of modularizing it all.

I bring this up because it seems a strong possibility with Modularity that all package owners and teams will need to take into account in the future. Depending on a rawhide model or fixed release schedule to assure package interactions are valid seems impossible in such a development model.

Part of the goal of the schedule is to assure that we all agree on and work towards a stable "something" with a given timeframe. If you change this discussion from "distribution" to a "module" then the schedule scope changes and it becomes a question of knowing when the group of packages that make up the module are stable enough to transition the module to being a stable thing you can use. So it becomes easier to plan IMO. We still have the same problem but now we're churning out "Base" modules through a unstable->stable transition, and sending them off to be used in a composable system. As I mentioned earlier, some of our CI efforts should be in testing unstable modules to bring them to a stable state.

Putting this on agenda for Friday's meeting at 17:00 UTC

Replying to [comment:5 mattdm]:

Carlos, my goal is for Fedora to target early May, late October every time. That's a (roughly) six-month target. But, to keep to that, given that we sometimes have delays, it means that we need to have correspondingly-shorter cycles following as a corrective measure.

Setting aside talk of modular future, this early May, late October goal works very well for Workstation. We want to release roughly one month behind GNOME. Low-level components like glibc and GCC are very important to Workstation, but having the latest versions of them is frankly not; delaying a Workstation release for these, as we did with F24, would be unfortunate.

My main concern here is that F25 is scheduled two weeks behind our target final Tuesday of the last week of October, so glibc team's request to not move any earlier conflicts with our desire to be two weeks earlier with F27 next year. I notice that glibc's February-August/September cadence is a relatively recent thing; any chance glibc would be willing to move back a couple weeks, and do January-July instead? Alternatively, we could branch rawhide two weeks later in the cycle. I don't think that would materially affect the stability of our alpha releases.

I'm frankly more concerned about the GCC schedule (which seems to seriously conflict with our goal for release in early May) than I am the glibc schedule (which seems to almost work; it feels like we're only off by a couple weeks here).

We didn't get to discussing this in today's FESCo meeting, punting to next week.

Replying to [comment:15 catanzaro]:

Replying to [comment:5 mattdm]:

Carlos, my goal is for Fedora to target early May, late October every time. That's a (roughly) six-month target. But, to keep to that, given that we sometimes have delays, it means that we need to have correspondingly-shorter cycles following as a corrective measure.

Setting aside talk of modular future, this early May, late October goal works very well for Workstation. We want to release roughly one month behind GNOME. Low-level components like glibc and GCC are very important to Workstation, but having the latest versions of them is frankly not; delaying a Workstation release for these, as we did with F24, would be unfortunate.

I expect your users do care...

(1) Hardware optimized routines e.g. Intel AVX512, Intel ERMS support, Intel MPX, Improved branch prediction on KNL/Silvermont, Intel Vectorized Math Library etc.

(2) glibc dynamic loader changes required to allow applications to make use of new hardware features e.g. AVX512 and audit modules using AVX512 (save/restore around audit modules), Intel MPX (save/restore around plt trampoline and resolver functions). So even if you could compile your application with a newer gcc, you'd still be unable to use certain features until the dynamic loader was prepared for them.

(3) Synchronize with newer kernel headers to provide features to applications e.g. expose new syscalls, support newer syscalls automatically, expose new structures in ABI compatible ways, etc.

(4) CVE and security fixes for core network and other APIs.

(5) Support ISV porting of applications built on other distributions that have moved ahead with newer versions of core runtimes. For example Ubuntu 16 and Fedora 24 use the same glibc 2.23. This makes it easier to ensure Ubuntu 16 application work at the lowest ABI level in the same way on Fedora 24.

(6) Reduce build up of ABI changes that require complex triage at the toolchain level.

From a release schedule perspective I have argued that (6) is a key issue for engineering.

I expect users care about (1), (2), (3), (4) and (5).

My main concern here is that F25 is scheduled two weeks behind our target final Tuesday of the last week of October, so glibc team's request to not move any earlier conflicts with our desire to be two weeks earlier with F27 next year. I notice that glibc's February-August/September cadence is a relatively recent thing; any chance glibc would be willing to move back a couple weeks, and do January-July instead? Alternatively, we could branch rawhide two weeks later in the cycle. I don't think that would materially affect the stability of our alpha releases.

The glibc release was previously January/July.

To be precise it was a December release and I was the release manager who changed it to from January/July to February/August because a December/January release is a logistical nightmare with everyone away for holidays.

The glibc February/August cadence started in 2013-08-12 with glibc 2.18.

I don't think glibc can move the schedule any earlier than February/August.

Moving F27 two weeks earlier means that during Alpha Freeze glibc would have to rebase against the stable release that comes out in August.

In comment 8 I wrote "The Fedora glibc team does not want to change any core ABI after Alpha Freeze." The reason is that if an ABI change is reverted in the final reviews for glibc it means Fedora must go through a mass rebuild to purge the ABI change or carry an incompatible hybrid ABI that means ISVs will have a harder time targeting Fedora, and it would make that Fedora release unusable as a RHEL branchpoint (we would fix it in the next release).

Branching rawhide two weeks later seems the lowest risk choice. I can't comment on that really since I don't have experience with making and keeping the schedule.

I'm frankly more concerned about the GCC schedule (which seems to seriously conflict with our goal for release in early May) than I am the glibc schedule (which seems to almost work; it feels like we're only off by a couple weeks here).

I can't comment on the gcc schedule.

Please ask Jakub Jelinek and Jeff Law to comment.

Replying to [comment:15 catanzaro]:

Replying to [comment:5 mattdm]:

<snip>

My main concern here is that F25 is scheduled two weeks behind our target final Tuesday of the last week of October, so glibc team's request to not move any earlier conflicts with our desire to be two weeks earlier with F27 next year. I notice that glibc's February-August/September cadence is a relatively recent thing; any chance glibc would be willing to move back a couple weeks, and do January-July instead? Alternatively, we could branch rawhide two weeks later in the cycle. I don't think that would materially affect the stability of our alpha releases.

I'm frankly more concerned about the GCC schedule (which seems to seriously conflict with our goal for release in early May) than I am the glibc schedule (which seems to almost work; it feels like we're only off by a couple weeks here).

I actually am trying to ensure we drop Alpha alltogether and just do Beta and Final, with rawhide always being Alpha or better quality.

Dennis

We discussed this in today's meeting and the final proposal agreed for this ticket is

agreed Fedora Schedule should always schedule mass rebuilds (if needed) after glibc final freeze. (5, 1, 0)

if any issues still remain to be addressed here then please ask that specific thing and reopen this ticket.

Login to comment on this ticket.

Metadata