#352 Cloud Working Group Correction Of Errors Report
Closed: fixed 6 months ago by davdunc. Opened 6 months ago by davdunc.

What Happened?

Link to Bugzilla Entry

In what was, at least to the active members of the Fedora Cloud Working Group, a flash
decision on October 29th, 2021, the Fedora cloud images were eliminated from
consideration as an edition for Fedora 35
. There was some historical foundation for
making this decision, but it did not, in the opinion of the working group members,
disqualify the actions and delivery requirements for the group as they had been actively
working towards a successful edition release with modifications and updates. With the
popularity of the cloud offering today, it still deserves to be available on the Fedora
Website and as an edition.

This was an unexpected setback for the cloud working group. As a working group, we had
an appointed a member to work on including the edition in the new website revamp and it
was almost two-years ago that we were told to wait for the website revamp before the
edition would be changing locations. The effort put forth to make the cloud images possible
for public and private hyperscale environments is a distinctly complex action, but like
every other edition, it is wrapped up in the common concerns of delivering the other
server, workstation, and immutable OS editions.

What was the Impact

The Fedora Cloud base image was removed as an edition from the F35 release of
Fedora. The cloud working group's very existence has been challenged and there was a
move from community members to force the Working Group to reassemble by updating the PRD
and actitivies. As the Fedora Cloud base image is no longer considered an edition, it
will not be included in the Website revamp which is a currently ongoing project.

What was the Root Cause?

Link to Reference Discussion

Matthew Miller was kind enough to collect a series of discussions that demonstrated a
thread of shifting the focus of the cloud working group from the standard Fedora Cloud
base images to work related to the Atomic image. Shifting of the Cloud WG to Atomic WG
was formalized 4 years ago and there was an effort to stabilize the Atomic experience and fulfill
the needs of the Cloud image user base with the Atomic Project. Before that was
completed, Red Hat purchased CoreOS, a lightweight operating system purpose-built to
support lightweight clustering systems. This purchase eclipsed the work being done on
Project Atomic and brought with it a significantly large code base and distribution
model that was not yet integrated with any of the Cloud WG's program. Meanwhile the
Cloud SIG continued to provide a distinctly minimal Fedora base without the integration
of OStree. Fedora CoreOS superseded the Project Atomic mechanics, but did not align with
the Cloud WG goals. It is clear that there was great discussion when Atomic WG was being
replaced with CoreOS, but at the same time, the goals of the Cloud WG and the Atomic WG
were not yet unified so the Fedora Cloud WG remained active independently. The Fedora
CoreOS mission and the Fedora Cloud mission have not aligned either and continue to
represent distinct user bases.

For both the cloud working group and the FCOS working group, a significant amount of
work continued, so there has been an operational split in what is delivered by the Cloud
and FCOS working groups and updates to the goals related to the cloud images and the
FCOS in separated programs. The cloud images maintain a focus on supporting distinctly
different workloads than those of the lightweight containers workloads targeted by
the FCOS working group. The Cloud Working Group continues improving the cloud image base
to continue delighting users under the impression that they were still included in the
editions.

What lessons did we learn?

From the discussion it was clear that there was a difference of opinion over the value
of the cloud base edition delivered independent of other editions. Opinions were voiced
identifying that there was an argument to be made for the cloud base image to be a
secondary form of one of the other editions, such as Server or Workstation with no
requirement to consider it specialized. There will need a new charter for the cloud
images since there are a number of council members who have called for clarification and
an updated review.

  • The Cloud Working Group learned that the Fedora Cloud image is not an edition and
    that they cannot establish without a new PRD and council review
  • The cloud image base will not be included in the website revamp and will remain in
    alternate downloads

Is there a clear value to the community?

According to the statistics presented by Matthew Miller, Fedora Cloud base use makes up
more than 15% of the total persistent<sup>1</sup> installations and more than 30% of the
total ephemeral<sup>2</sup> installations today making it the second most popular effort in
the portfolio after Desktop.

The first issue raised was one of value to users and developers. There was an opinion
expressed that the goals of these, cloud edition are not sufficiently divergent from
those of the Fedora Server edition to warrant a second series of goals and
responsibilities. Additionally, another opinion was stated that at times, the goals of
Cloud overlap with those of Workstation Edition. This was stated as a reason to
consider delivering cloud based distributions of both of these editions rather than
keeping this single edition that matches this overlap with additional specialization.

The argument that there is insufficient value in the Cloud edition because it has
overlapping value in other projects is not, in our evaluation, sufficient to support
giving up on highly specialized development with a sensitivity to the modifications
based on functional requirements in the individual environments supported (more every
day)

Why isn't the Fedora cloud image just a derivative of Fedora {Server,Workstation} edition?

Matthew also identified himself as one of those people who no longer sees the
variations in the versions. He was convinced, but now he isn't as to what kind of
controls separate server from cloud editions. There were a couple of current
considerations that came to mind immediately that determine those differences. Thanks
to Matthew for helping us to tailor our discussion.

David Duncan mentioned in a recent Fedora Podcast with Grayson, that users who work
with Cloud instances in public cloud have access to the latest hardware and platform
specialities. Just like the flexibility users have in instance types, they require
flexibility in what they do on their instances. That means advanced specialized
functionality for the platform infrastructure with little to no additional
overhead. The mission of server includes preserving support for legacy hardware and
that's just not a critical focus for Cloud images.

Fedora Server has a strong requirement to remain stable, which is why, when the cloud
working group found that moving to btrfs as a file system was a compelling position,
Server edition working group members voiced concerns that users would not have access
to advanced RAID types in an attempt to identify that this was not the decision they
would make. For users adept in cloud configurations, it is well-known that the advanced
RAID types are not generally useful. The virtual volumes presented as hardware and used
by cloud consumers are typically already striped and redundant behind the hyperscaler
storage. Virtual volumes are elastic as a result and extend to match larger sizes
without requiring advanced RAID to achieve that outcome. What is useful to a
typical cloud user is advanced partitioning with subvolumes and additional methods for
handling dynamic snapshots and in cases where there are adjustments or updates to
deliver for new images cloud is focused on using techniques for fast API transfer, like
the ones found in coldsnap.

Cloud, not Server deals with hyperscaler platform variations, like the use of swap files
and power management in opportunistic compute power. That isn't an issue for Fedora
Server since the partitioning is expected to follow disk management practices. It might
be something that you want to address on workstation, but even then, using a swapfile
would be a strong deviation from the best practices for standard installations and
deployment. Workstation WG and Server WG are keenly focused on different support
goals. The FCOS WG is focused on building a foundation for lightweight clusters and
basic batch efforts. The difference in focus differentiates them. The Cloud WG is
valuable to both persistent and ephemeral users who are taking advantage of platform
features in their modernization.

This is by no means an exhaustive list of the distinctions in mission or functionality.

The Five Why's

Why is the Cloud Group being asked to reiterate their value when others are not as influential?

The cloud working group, while still active, was intended to fold into the Atomic
Working Group when it was stable. That never really happened. The Atomic Working Group
was dissolved in favor of the Fedora CoreOS alignment with Red Hat CoreOS when Red Hat
acquired CoreOS. The Cloud Working Group remained active throughout all of these changes
and so while there was a plan of action in the council to make the FCOS working group
replace the Atomic WG, there was no direct dissolution of the cloud working group
because it had remained active throughout. Now, there is a BZ that states that Cloud is
no longer an edition and that means the team will have to reestablish the status as an
edition.

Why are there people of the opinion that the Cloud Working Group could fold into another edition?

The Server Edition has a direct affiliation with the RHEL product line. It is a forgone
conclusion that it exists as an asset of continued significance with planned use in the
future. Therefore there is no way to remove it without creating a gap in the current
sponsorship. FCOS also has that same direct relationship with Red Hat CoreOS and cannot
deviate from the downstream alignment without creating uncertainty. The cloud image is
not currently connected to a release by a direct line. It merely informs the builds for
the cloud images for Red Hat Enterprise Linux. With the move to use BTRFS, there was a
strong divergence from Server. Moving the project under server would be a forcing
function for the group to revert changes that are beneficial to public and private cloud
initiatives in favor of legacy expectations.

With Workstation, there are similar concerns regarding the alignment. For developers and
site reliability engineers, there is no association with Workstation as a directive for
a minimal installation. It would be a considerable change for this group that has
historically been focused on Laptops and PC Hardware. The Target Audience would use easy
to configure virtual machines for development that isn't container specific. It is more
likely that those developers would benefit from the Fedora Cloud base images to achieve a
lightweight development environment that meets their requirements for a minimal install
environment and dynamic configuration through userdata and automation. Vagrant images
are also the product of the Cloud WG.

Why doesn't the cloud image base align with the rest of Fedora on a file system?

Fedora Workstation was the first to make the move to btrfs and it was alignment with
that existing direction that helped the Cloud WG to move the cloud images to deviate
from what the Server working group was doing. It was not expected that the Fedora Server
would be able to be consistent with these directives as there is not yet full support
for RAID5 and RAID6. Fedora CoreOS moved to XFS before workstation moved to btrfs for
reasons of alignment, but originally, they were openly using use of btrfs and that was
simply discouraged based on concerns related to issues of consistency and negative
perceptions of btrfs. With Fedora Cloud Base image, the advantages outweighed the
alternatives in moving away from ext4.

Why isn't the cloud working group just producing raw images of Server and Workstation instead?

The Cloud base image goals are not the same as Workstation or Server, but the mechanics
are similar. Hyperscalers and public clouds don't need another server and do not
typically require all the components for Workstation. The emphasis in cloud is on boot
times and complex snapshot management or platform special requirements. In many cases,
there are platform services that cannot be fully utilized using other editions without
significant modifications and the Cloud Edition will continue to focus on these
requirements to ensure that it remains popular with users.

Why is the Fedora cloud image popular with users

This is mostly answered by the environments where customers choose to run their
workloads. A considerable number of workloads are moving to cloud and hyperscaler
environments without changing their fundamental runbooks. Fedora Cloud images are
popular because they are available where customers require them. This will further
increase as the cloud working group continues to extend the availability of the images
and tailors them to various environments' specific needs as a principal goal for their
target users.

Conclusion

The focus of the working group is the most important aspect of the published editions,
but clarity on the status of the working group can retard the progress of even a strong
working plan. The current cloud working group members believed that the dissolution of
the Atomic WG returned the focus to the original charter when FCOS formed independently,
with different goals and focus. Getting back to the expected status requires the cloud
group to retrace their steps and lose ground gained over years of action.

As of this midnight hour change for the release of F35, effort will be required to
reestablish the working cloud release as an edition. The Cloud Base image will not be
included in the F35 releases or the website revamp. The cloud working group learned that
de facto working programs, even when all processes align with the target goals
previously agreed upon, cannot undo a procedural confusion that aligns with a
well-intentioned, but inconsistent chain of decisions. The cloud working group intends
to remedy this as soon as possible with an expectation that we can be back on track by
the release of F36.

Footnotes

<sup>1</sup> persistent - installed for more than one week.

<sup>2</sup> ephemeral - installed for less than one week.


Metadata Update from @davdunc:
- Issue tagged with: meeting

6 months ago

The Bugzilla bug isn't the key thing here. The key change here happened years ago, when we made Atomic Host an edition and Cloud not an edition. Cloud would not have been an edition for Fedora 35 whether or not that bug was filed. It wasn't one for F34, or F33, or F32, or etc. etc. There was no "consideration" for edition status going on; we haven't "considered" what the editions are since whenever we made IoT one (last cycle, I think).

At least as early as 2018, this was clearly explained on the wiki page - note that it says "The initial editions, at that time referred to as 'products', were Workstation, Server, and Cloud, and this split was initially implemented in the Fedora 21 release. An objective covered further development and refinement of the concept over the next few years, including the replacement of Cloud by Atomic, and settling on the name 'editions'."

Per https://web.archive.org/web/20170901000000*/getfedora.org ), getfedora.org switched from listing Cloud as an edition to listing Atomic as an edition between June 2016 and December 2016. Cloud base images were already buried (actually, more than they are now) at that point.

Aside from airy-fairy theoretical debates, those are the actual practical consequences of being an edition. If you want a practical answer to the question of "what is an Edition?" rather than a theological one, the answer is "they're the things up front in big letters on getfedora.org". So ipso facto, Cloud hasn't been one for about five years at this point.

I think it's worth considering that all the stuff you mention above about the install base of the Cloud images is true even though it hasn't been an Edition for five years. So it's reasonable to ask whether it really matters. I think you could note that, since the main practical consequence of "edition status" so far as Fedora users are concerned is the placement on getfedora.org, it doesn't necessarily matter much to Cloud because most people deploying in clouds do not do so by downloading image files from distribution download sites. They just pick an image to deploy from their cloud's deployment interface.

Thus the key question for Cloud, really, is "what Fedora image(s) do the major cloud vendors offer for easy deployment, and how prominent are they?" What's on getfedora.org is obviously less important (or else we'd have about as many Cloud users as we do LxQt users).

The main problem with not being on getfedora.org is the lack of visibility makes it difficult to attract contributors. And people who are new to the cloud don't even know we have a Cloud variant that they can use as they move from traditional server to cloud infrastructure.

We have a similar problem with the Fedora base container, too, but people seem to be less concerned about that at the moment (though it's on my radar of Things That Need To Be Dealt With).

Discussed in Cloud meeting today. The team has unanimously agreed that they would like to pursue Edition status targeting F36 as the earliest possible release for which we can make this change. Updates are planned to the PRD and we have additional work to be done. The issue is resolved however as we believe we have what we need to move forward on reclaiming the edition label with all that entails.

Metadata Update from @davdunc:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 months ago

Metadata Update from @davdunc:
- Issue untagged with: meeting

6 months ago

Login to comment on this ticket.

Metadata