#231 clarify meaning of "rolling" for future fedora atomic releases
Opened 7 years ago by jasonbrooks. Modified 6 years ago

The Atomic WG has unofficially paid attention to only a single fedora atomic release at a time, specifically, the release based on the current latest stable Fedora release. There's a proposal to formalize this in issue 228.

In the discussion around this issue, @jberkus asked why, if we are to support only a single fedora atomic release at a time, do we not adopt a "rolling" release structure for fedora atomic, where the tree served from the fedora atomic ostree repo is always composed from the current latest stable Fedora release.

In response, @dustymabe suggested that "rolling" could mean many different things, and that we should discuss these many meanings in a future VFAD.

In this issue, we can collect some thoughts on what "rolling" means in advance of this VFAD.

To me, a rolling fedora would match up with what rhel and centos atomic do. There's a single repo, and an upgrade from rhel atomic 7.1 to 7.2 to 7.3, etc. simply involves runnning atomic host upgrade. The same would apply to fedora for 25 to 26 to 27, etc.

Currently, fedora users are expected to rebase each six months, in the same way that they might rebase between completely separate streams, such as from fedora atomic to centos atomic.

So, upgrades, today:

$ sudo ostree remote delete fedora-atomic
$ sudo ostree remote add fedora-atomic --set=gpg-verify=false $ https://dl.fedoraproject.org/pub/fedora/linux/atomic/25
$ sudo rpm-ostree rebase fedora-atomic:fedora-atomic/25/x86_64/docker-host
$ sudo systemctl reboot

Upgrades, in a "rolling" world:

$ sudo atomic host upgrade
$ sudo systemctl reboot

This would have zero impact on the current two-week release scheme. Just as we tried to release fedora atomic 24 every two weeks up until fedora 25 release day, when we shifted to trying to release fedora atomic 25 every two weeks, and stopped paying attention to the 24 stream, we would, at some future point, be releasing fedora atomic every two weeks based on f2N rpms, switch over to the f2N+1 rpms on or near release day, and proceed from there.

The only difference is convenience and clarity for our users.

Before Dusty cited the many meanings of "rolling" I hadn't even considered additional meanings, but maybe we can list some of those here in preparation for the VFAD.


@jasonbrooks My understanding of "rolling" is the same as yours.

@jasonbrooks. at least one other possible interpretation of "rolling" is that we consume from rawhide and don't take Number release content. This would prevent "large change" upgrades like when going from f24 to f25, but would be problematic for other reasons.

I can accept your definition of rolling (although, I'm not sure if we should call it rolling) with some tweaks:

  • rather than making the "atomic host upgrade" automatically go from 25->26, make it a bit of a bigger deal. options:
    • we make "autmatically upgrading across major version boundaries" configurable and "opt in"
    • we enhance the interface to let the user know about the pending EOL and the move to the new major version

this could look like:

$ sudo atomic host upgrade

This major version of Fedora Atomic Host is now EOL. Would you like to rebase and switch to Fedora 26 atomic host? (y/n): y

Adding new remote fedora-26
Rebasing to fedora-26:fedora/26/x86_64/atomic-host
....
....

WDYT?

It feels weird to do semi-automatic rebases on the client side. Not saying it's wrong. But we need to think a bit about how people manage automated systems.

I guess my question here is - who wouldn't want a single stream? I can think of an answer for today - with kube in tree, a Fedora major may mean a new kube major. (Though we haven't really defined this at all....)

But once we move kube (and possibly docker) into containers, I think we have a much stronger argument for doing a single stream by default (though for development we should have 25/26/etc branches). The "container infrastructure containers" live on their own upgrade cycle.

Dusty:

You're thinking in terms of systems which are updated, individually, by hand. This is not how people admin most systems anymore, and definitely not what Atomic is for. Atomic is aimed at clouds of automatically managed systems, so any discussion of updates and upgrades and rebases needs to be in this context. You really need to get the outdated paradigm out of your thinking.

As I see it, we have three choices:

  1. we have one stream, and "upgrade" automatically takes you forward, even if you're consuming the "next" Fedora major release.

  2. we require rebase for "major version" changes, and continue to support the ostree for the old version for some defined period of time.

  3. We release a system where users' clouds break every 6 months requiring a manual intervention (because rebase is required and ostree isn't updated), and eventual migration to CoreOS, which actually supports cloud automation.

These are the realistic choices. You don't get others. And "enhancing the UI" is irrelevant, because nobody will be looking at it.

Walters:

Yes, we need to plan out how we're going to deal with backwards compatibility issues (OverlayFS also comes to mind as a problem).

@jasonbrooks. at least one other possible interpretation of "rolling" is that we consume from rawhide and don't take Number release content. This would prevent "large change" upgrades like when going from f24 to f25, but would be problematic for other reasons.

How large are the changes, really? IMO the biggest and most disruptive component in Fedora Atomic is the kernel, and that rolls within major releases already, w/ no user choice or opt-in, short of not installing the update.

I can accept your definition of rolling (although, I'm not sure if we should call it rolling) with some tweaks:

I didn't introduce the term "rolling" in this discussion -- my question is not rolling or not rolling, my question is why, if we support a single stream of content, do we not deliver that single stream of content in a single stream. If we want to call that rolling, great, but we can call it anything.

rather than making the "atomic host upgrade" automatically go from 25->26, make it a bit of a bigger deal. options:
we make "autmatically upgrading across major version boundaries" configurable and "opt in"
we enhance the interface to let the user know about the pending EOL and the move to the new major version

I don't like it. You choose whether or not to upgrade your system. There's the opt in/out. If you choose upgrades, why not deliver the bits we're supporting?

If there's some life-support-only option, why not make that the option that requires user action? So, by default, you're on fedora atomic, and if you choose, you can optionally rebase to the life-support ref.

If f2n to f2n+1 upgrades, whether delivered via regular atomic host upgrade or via rebase, don't go smoothly for people, then that'll be the fault of this WG. If we're only actively supporting one major release at a time, this is how it must be. So why not make it easier/clearer?

We have the perfect software distribution mechanism for this, complete with rollback if the user is not happy -- let's use it, and show off our strengths.

It feels weird to do semi-automatic rebases on the client side. Not saying it's wrong. But we need to think a bit about how people manage automated systems.
I guess my question here is - who wouldn't want a single stream? I can think of an answer for today - with kube in tree, a Fedora major may mean a new kube major. (Though we haven't really defined this at all....)

I've been thinking about adding a package like kubernetes-master-container, that would include just the systemd service files needed to pull and run the master components. We could do the same for kubernetes-node. This way, an upgrade that removed the binaries could continue working pretty seamlessly.

@jberkus - i'm fully operating in many contexts. I'm thinking about many different use cases where systems are upgraded manually and where systems are upgraded automatically. What I would like to do is operate on a principle of least surprise. Up until this point I would say that upgrading from one major release to the next around "major release day" would have been a really rocky process that isn't necessarily something that people would want to happen without knowing what they were upgrading to beforehand. This is a big reason why I would want to make "automated major upgrades" a configurable option, so the user can choose the behavior.

Since we are now working more effectively as a working group, maybe we'll be good enough to make sure that transitions from one release to the next go more smoothly than they have in the past. I agree that kubernetes and/or docker have been the largest agents for "pause" on performing major upgrades. Agreed that if we remove those variables then a single stream becomes less of a concern.

It might be the wrong time to do this, but I'll bring it up now. I personally like the fact that we have different ostree repos for different major versions of Fedora. This allows us to timebox adding updates to the repo (meaning it doesn't grow in size forever; yes pruning exists, I know), as well as create new repos periodically in the future (think new mode for storage for ostree repo that we want to start using). If we support "automatically going to the next major version" we either need to combine the ostree repos into one repo or we need to support some sort of automated "add remote + rebase" support in ostree/rpm-ostree. Can we resolve what we would need to do about this particular issue? I prefer automated "add remote + rebase", but that isn't something that exists today.

"Up until this point I would say that upgrading from one major release to the next around "major release day" would have been a really rocky process that isn't necessarily something that people would want to happen without knowing what they were upgrading to beforehand."

Agreed, but if we're not supporting the older OSTree in any way, we've effectively given users a very short time limit on when they need to rebase/upgrade, with the possibility of being forced into it when the older OSTree unexpectedly breaks because we're not testing it anymore, or if we're not updating it, because a new security issue is announced. Either way, we are forcing users to upgrade on our timeline, not theirs.

"If we support "automatically going to the next major version" we either need to combine the ostree repos into one repo or we need to support some sort of automated "add remote + rebase" support in ostree/rpm-ostree. Can we resolve what we would need to do about this particular issue? I prefer automated "add remote + rebase", but that isn't something that exists today. "

I'm happy to let that issue be a technical decision; I think our users would be OK either way.

EXCEPT, if you go for "automated add remote + rebase", then we need to solve the issue that you can't roll back from that reliably. Mind you, that's something that we need to solve anyway.

If we support "automatically going to the next major version" we either need to combine the ostree repos into one repo or we need to support some sort of automated "add remote + rebase" support in ostree/rpm-ostree. Can we resolve what we would need to do about this particular issue? I prefer automated "add remote + rebase", but that isn't something that exists today.

I'm +1 to combining the ostree repos into one. There's nothing strange about this, it's how the other atomic hosts operate. Fedora is the outlier in requiring rebases between version upgrades.

It'll also be much nicer for vagrant -- try vagrant init fedora/24-atomic-host -- it doesn't work any more. And places like the kube ansible scripts, which call on atlas for the vagrant boxes, need updates each time fedora atomic revs, where centos atomic is always accessible at centos/atomic-host.

@jberkus
Agreed, but if we're not supporting the older OSTree in any way, we've effectively given users a very short time limit on when they need to rebase/upgrade, with the possibility of being forced into it when the older OSTree unexpectedly breaks because we're not testing it anymore, or if we're not updating it, because a new security issue is announced. Either way, we are forcing users to upgrade on our timeline, not theirs.

I'd actually like to have a grace period of like 30-60 days where we provide life support for the previous release so that we're not "forcing" the issue as much.

@jberkus
I'm happy to let that issue be a technical decision; I think our users would be OK either way.
EXCEPT, if you go for "automated add remote + rebase", then we need to solve the issue that you can't roll back from that reliably. Mind you, that's something that we need to solve anyway.

rolling back would work, but you're "remote" would be wrong. I think this is what you're referring to.

@jbrooks
I'm +1 to combining the ostree repos into one. There's nothing strange about this, it's how the other atomic hosts operate. Fedora is the outlier in requiring rebases between version upgrades.

We could do it, but I'd prefer not to. If there are clean points to "prevent migrations" I'd prefer to just do that. One thing we could do is seed a new ostree repo with the latest N-1 content and then start building from there, but it's still more work than just creating a new repo from scratch.

@jbrooks
It'll also be much nicer for vagrant -- try vagrant init fedora/24-atomic-host -- it doesn't work any more. And places like the kube ansible scripts, which call on atlas for the vagrant boxes, need updates each time fedora atomic revs, where centos atomic is always accessible at centos/atomic-host.

This discussion doesn't affect vagrant at all. The reason vagrant init fedora/24-atomic-host doesn't work any more is because we "clean up" old two week release disk images. I personally would like to keep them around, but i've been met with resistance on that and don't have good enough reasons for doing so.

We could do it, but I'd prefer not to. If there are clean points to "prevent migrations" I'd prefer to just do that. One thing we could do is seed a new ostree repo with the latest N-1 content and then start building from there, but it's still more work than just creating a new repo from scratch.

I'm not sure I understand. If you want a new unified repo to be brand new, that's fine w/ me. I just want to get rid of the separate refs for each major version / required rebase nonsense.

This discussion doesn't affect vagrant at all. The reason vagrant init fedora/24-atomic-host doesn't work any more is because we "clean up" old two week release disk images. I personally would like to keep them around, but i've been met with resistance on that and don't have good enough reasons for doing so.

You're ignoring the second part of my sentence: "And places like the kube ansible scripts, which call on atlas for the vagrant boxes, need updates each time fedora atomic revs, where centos atomic is always accessible at centos/atomic-host." We have fedora/23-atomic-host, fedora/24-atomic-host, fedora/25-atomic-host, where we should have fedora/atomic-host. It's another place where arbitrarily splitting up fedora atomic causes inconvenience.

I'm not sure I understand. If you want a new unified repo to be brand new, that's fine w/ me. I just want to get rid of the separate refs for each major version / required rebase nonsense.

I'm just saying that I don't really want a "unified repo", but that I do agree with you that I'd like to make the user experience a good one to the point they don't even know they are getting rebased unless they look at the logs or explicitly configure the client to not do that. This would give us a good operational experience (as we can clearly make big changes to the backend repo periodically) as well as a good user experience (they don't need to do anything other than what they are already doing).

You're ignoring the second part of my sentence: "And places like the kube ansible scripts, which call on atlas for the vagrant boxes, need updates each time fedora atomic revs, where centos atomic is always accessible at centos/atomic-host." We have fedora/23-atomic-host, fedora/24-atomic-host, fedora/25-atomic-host, where we should have fedora/atomic-host. It's another place where arbitrarily splitting up fedora atomic causes inconvenience.

I see what you mean. yes we can chat about that as well. Like I said I still see value in being able to grab the older versions, but we don't allow for that today since we delete them after some time, so having f24/f25, etc doesn't really give us value. I think let's discuss this change maybe in a separate ticket?

@dustymabe How about this: why should fedora do it differently than rhel? RHEL AH has had the same tree through all docker upgrades, through all kube upgrades, through half of kube being removed, across 7.1, 7.2, 7.3. They support one atomic, so they have one atomic. You want to support one atomic, so why do it differently?

Metadata Update from @dustymabe:
- Issue tagged with: host

7 years ago

We will be having a meeting on BlueJeans for this at:

Friday, March 17th, 1pm EDT / 17:00 UTC.

http://bluejeans.com/8169971214

Dial-in:
+1 800 451 8679
+1 212 729 5016
meeting #8169971214

We met today to discuss what a rolling release would look like for fedora atomic host (recording here). The notes from the etherpad we used during the meeting are attached.

Here is a summary:

  • We would like to make it seamless for users to follow the latest released FAH (including across major number releases of Fedora)
  • We can achieve this in a couple of ways. 1st is by having a stable or latest ref in our repo that is a symbolic link to the ref that follows a particular release. i.e. for now latest -> f25. Once f26 is out we would update the symbolic link. This would require a single unified ostree repo.
  • 2nd would be to point users at a "remote" that is essentially a URL redirect that represents stable or latest and keep the ref names the same in the individual (f25,f26) ostree repos. This way the "remote" url could be switched easily. If this "works" it would allow us to keep ostree repos separate for each major release of Fedora.

If we decide the first one is what we would want to do (i believe this was the consensus in the meeting) then there are some possible work items:

  • work to make a unified repo
  • verify the backup strategy for the repos we serve is sufficient
  • need to set up tooling for pruning the ostree repo
  • sym link needs to be updated when new NUMBER release is first released
  • ostree creation process needs to point to new unified repo
  • 2wk script needs to be updated to update unified repo
  • Ensure that all relevant groups throughout Fedora (Releng, QA, Websites) know what the plan is and how it could impact them (also be willing to do the work to make it happen if extra work is needed)

20170317-FAH-rollingrelease.html

This is also related to https://pagure.io/atomic-wg/issue/228

I'd say they're close enough actually we should close the former in favor of this ticket.

I would expect "rolling release" to mean that the operating system itself has no release version, and instead applications and libraries and things are updated as soon as there's a new, reasonably stable release. As a user, and even as a sysadmin, this seems like a nice way of doing things to me -- there's less need to start over with major OS changes, because updates become smaller, more incremental.

I of course understand that sometimes a piece of software will make backwards-compatibility-breaking changes, and we need a way to let users adapt to those changes. But perhaps instead of discrete distro releases, we could work toward having warnings and a period of time for delaying updates for those particular backward-breaking packages.

Log in to comment on this ticket.

Metadata
Attachments 1