#253 Objective proposal: Fedora Minimization
Closed: approved 4 years ago by bcotton. Opened 4 years ago by asamalik.

I'd like to propose a new objective to the Fedora Council. All the info should be in the proposal below.


Fedora Minimization Objective

Objective lead: Adam Samalik (asamalik)

Vision

Thousands of individual and corporate contributors collaborate in the Fedora community to explore new problems and to build a fast-moving modern OS with a rich ecosystem allowing them to experiment on modernising their infrastructure.

The Fedora community produces the number one Linux OS for modern deployments such as containers and IoT. Fedora enables users to build small container and other images with the desired functionality by providing optimised dependency trees of its rich ecosystem, and an ability to fine-tune various installations to achieve the right balance between features and size.

The problem

While Fedora is well suited for traditional physical/virtual workstations and servers, it is often overlooked for use cases beyond traditional installs.

Patching footprint

There is a direct relationship between the installation footprint and attack surface & relevant CVEs. Many users and use cases benefit from reducing the number of patches necessary to keep the system or image updated. Also, reboots are expensive for certain environments and these environments also benefit from a reduced number of patches.

Container Image Size

Image size is certainly not the only factor that matters in the container space but it definitely matters especially as the number of images being maintained and regularly respun grows. The Fedora base image has grown to over 300 MB and is roughly three times larger than other mainstream distributions like Ubuntu, Debian, & openSUSE which range from 91-113 MB.

Chart demonstrating the growth of Fedora base container images over time

Package Dependencies

RPMs in Fedora often use hard dependencies for cases where that's not necessary — making the resulting installation bigger.

Goals

Problem exploration: Explore the problem of dependency trees becoming too big over time and minimize them. The ultimate goal is to make the size of things built on top of our images (containers, ISOs, etc.) smaller.

Packaging optimization: Optimize dependencies of selected use cases to minimize their installation footprints, making the maintenance of their production deployments simpler (less things to worry about), and also their development faster (less things for CI to test).

Data and feedback for smaller images: Provide feedback backed by data to other teams responsible for creating images (containers, ISOs, etc.) to help them maintain the right balance between the image size and including useful content.

Mindshare: Write blog posts, present at conferences, and use other communication channels to show that Fedora cares about minimal but usable installations.

Outcomes

Fedora could become even more popular: Thanks to targeting the emerging use cases such as containers and IoT, and communicating that through different channels, Fedora could become even more popular for these use cases.

Patching footprint minimized: Running Fedora in production will mean having fewer dependencies present on a running system.

Small yet useful container base and application images: Optimized dependency trees and data generated by this objective will help define and build smaller images that still contain useful components that are generally expected to be there.

Better packaging experience: Tooling and services, and potential infrastructure improvements will help packagers to make the right decisions regarding dependencies with less effort.

Faster Fedora CI: Focusing on the minimized installations of the most popular use cases optimized by this objective the Fedora CI will be able to target those specific use cases and test them faster because there will be less packages in total.

Strategy

Technical Strategy

Step zero: focus on the most popular use cases: We will focus on the most popular use cases instead of the distribution in general. Defining these use-cases is part of this objective. We will then optimize those use cases. As a bonus, the area where they overlap will very roughly define a good starting point for content for our images that we potentially look into as a part of one of our stretch goals.

Step one: packages and their dependencies: Inspect various dependency trees of the most popular use-cases. Because multiple packages can potentially provide a certain dependency, an exploration of multiple installations will be necessary. There are multiple ways how to achieve different results — pre-installing certain packages on the target system, providing additional parameters to libsolv regarding weights of individual packages or the criteria of the overall transaction in terms of size and other aspects. This will be explored more deeply.

Step two: files and packaging optimization: Optimize the most viable dependency trees by looking at relations on the filesystem level, searching for ways how to potentially use weak dependencies, %doc macros, and other mechanisms such as splitting packages into smaller units in order to make the final installation smaller, with features being installable as optional.

Usefulness over size: Regarding images, we believe they should be useful first, and minimal second. Producing tiny images without the expected functionality would not be helpful. However, delivering the expected functionality in a minimal image is definitely beneficial.

Images are not our direct goal: We will not create smaller base images directly as a part of this very objective. Instead, we use the data we collect when optimizing the apps and runtimes we focus on and make suggestions backed by data to the people maintaining the images. But we might build some preview images on the side for demonstration purposes.

Using RPM: We're doing this with RPM, using features such as --nodocs, potentially creating alternative module streams with slimmed-down versions if that makes sense, etc. We're not achieving minimization by deleting files after installation. This might be obvious, but still worth mentioning.

Execution Strategy

Scripts over manual labour: Write scripts that will perform dependency analysis rather than doing the analysis manually. Even though this means more work short-term, the ability to inspect new versions of content as it appears (and potentially use this script in CI) is more beneficial long-term.

Large changes on the side first: In case there is a substantial change required to be made to the infrastructure or elsewhere, we deploy our own infrastructure using other resources rather than testing in the production environment. We will also do large experiments on the side. We believe that staging is not for experiments, it is to validate production-ready deployments before actually going in to production. We need a space to fail fast. After we test and validate, we will present a change proposal to the community.

Building an environment first: Our primary output is vision, tooling, services, and people being on board. We explore and demonstrate what is possible, socialise our vision and strategy, and develop tooling and services that help packagers make the right decisions. The team will use those tools to achieve results, of course, but enabling members of the community to help with this objective at their own pace is the primary goal.

Mentoring and consulting: With whatever we come up with, we offer active help to Fedora packagers to make sure they're able to get the full benefit of our tooling and services, etc. We will then write testimonials together explaining what packagers have achieved and how.

Do not reinvent the wheel: This feels very obvious, but from experience keeping this in mind helps us to actively search for existing resources before creating something new.

Deliverables & Timeline

Phase 1: Discovery (summer)

Preliminary discovery and analysis of the most popular use cases (a very basic one has already started on github asamalik/container-randomness producing a report.

Published blog posts (and maybe videos) about the process, about what is possible, and about our intentions. This will likely result in feedback and engagement from the community.

Talk at Flock 2019 about the objective, featuring some of the discoveries and plans for what's next.

Specific plans for the next phase.

Phase 2: Experiments (fall)

Blog posts and conference talks about what works and what doesn't.

A set of use cases with optimized dependencies (potentially built on the side) available.

Initial versions of tooling and services, and a set of guiding principles for packagers to help them make the right decisions.

Specific plans for the next phase.

Phase 3: Stabilization (winter)

Out of the experiments from the previous phase, we will choose the ones to focus on and formulate the plan.

Improved tooling and services that are usable by packagers.

Even more blog posts and conference talks (DevConf.cz and FOSDEM?).

Specific plans for the next phase.

Phase 4: Integration (spring/f32)

A set of requirements for the Fedora Infra and RelEng teams to implement changes that have been proven working and needed.

Potential changes in the Packaging Guidelines applied.

Tooling and services available in Fedora.

The dependency chains of the selected apps and runtimes in Fedora should become smaller at this point.

Active help to packagers in form of mentorship, workshops, and other means.

Testimonials that we write together with packagers that will demonstrate what they have achieved and how.


Changelog

2019-06-05.1: Change "tooling" to "tooling and services" at a few places to make it more obvious that potential new services providing automation might be the output as well as local command line utilities.

2019-06-05.2: Added a chart demonstrating the growth of Fedora base container images over time

2019-06-06: Clarified that we work with RPM in the technical strategy.

2019-06-07: Completely rewritten the Vision section which might have altered its message. A diff is provided in a comment below to make the change more transparent.


I have applied some minor changes (wording) based on feedback I've received. To track this and any potential future changes, there is now a new Changelog section at the bottom.

Proposal:

This Objective is approved for Phase 1. Because we are concerned about measurability and ... basically, knowing whether the Objective is actually working or not, while recognizing that it's hard to explore and have concrete goals at the same time, this Objective will automatically end in September (at Northern Hemisphere fall), with explicit renewal required for Phase 2.

At that time, Phase 1 work (blog posts, Flock talk, community discussion) can be evaluated, and at least one use case for Phase 2 should be in draft form.

Please vote on this in the next 7 days. (Before the next council meeting.) Thanks!

(Feel free to offer edits or counter-proposals too.)

Paul proposed that this objective replaces the Lifecycle objective. With that explicitly stated, I am

+1

Metadata Update from @bcotton:
- Issue priority set to: Next Meeting (was: Needs Review)
- Issue tagged with: objectives

4 years ago

+1 to approving Phase 1 (IMHO we should clarify whether it ends on 1 September or 30 September to have a clear understanding about the deadline).

I have rewritten the Vision part completely which might have altered its message. So here's a diff for transparency:

The new text:

Thousands of individual and corporate contributors collaborate in the Fedora community to explore new problems and to build a fast-moving modern OS with a rich ecosystem allowing them to experiment on modernising their infrastructure.

The Fedora community produces the number one Linux OS for modern deployments such as containers and IoT. Fedora enables users to build small container and other images with the desired functionality by providing optimised dependency trees of its rich ecosystem, and an ability to fine-tune various installations to achieve the right balance between features and size.

The previous text:

Fedora (the "product" of the community) is a Linux distribution with a rich ecosystem of popular applications, runtimes, and dependencies for emerging use cases — such as containers, IoT, etc. — that require a lightweight footprint.

Functionality should not be removed from the distribution, but users should be able to better pinpoint the needed OS content that enables their workload.

Approved for Phase 1 in combination of ticket and IRC votes.

Metadata Update from @bcotton:
- Issue close_status updated to: approved
- Issue status updated to: Closed (was: Open)

4 years ago

A huge +1 from me (even though it's already been approved).

This is awesome for IoT and basically encapsulates a lot of what I've done for my entire involvement in Fedora. Some of my first bugs were to reduce the gnome footprint so I could install it on my eeePC :)

Login to comment on this ticket.

Metadata