#274 Next phase of Minimization Objective
Closed: approved 5 days ago by bcotton. Opened 2 months ago by bcotton.

The Minimization Objective was approved for the first phase, ending at the end of September. Time to draft an updated proposal that demonstrates progress and plans for the next phase.


Fedora Minimization Objective

Objective lead: Adam Samalik (asamalik)

The problem

While Fedora is well suited for traditional physical/virtual workstations and servers, it is often overlooked for use cases beyond traditional installs.

Some modern types of deployments — such as IoT and containers — are quite sensitive to size. For IoT that's usually slow data connections (for updates/management) and for cloud and containers it's the massive scale.

A specific example is Systemd — while being very useful (everybody loves Systemd) and is always present on physical systems, it is rarely needed in containers. So it wasn't a problem for packages to require Systemd just for systemd-sysusers to create users. However, in containers, that means a significant size increase.

Besides that, basically all types of deployments benefit from a reduced size, as there is a direct relationship between the installation footprint and attacks surface & relevant CVEs.

Vision

Thousands of individual and corporate contributors collaborate in the Fedora community to explore new problems and to build a fast-moving modern OS with a rich ecosystem allowing them to experiment on modernising their infrastructure.

Mission

Helping open source developers, sysadmins, and Linux distribution maintainers to focus on what's relevant for them.

Outcomes

Fedora is a popular platform because its ecosystem is both cutting-edge and well optimized for modern deployments such as IoT and containers. That makes many people use Fedora rather than to build and assemble their own artifacts directly from upstream projects. And that relieves the pressure on open source developers caused by users who would otherwise ask for their specific security and other issues to be fixed quickly.

So:

  • Open source developers can focus on feature development
  • Sysadmins can easily consume pre-built bits that also get regular updates
  • Fedora contributors (vendors and individuals) can collaborate within the Fedora community on exploring and developing open source solutions to problems of the future

Outputs

Specific use cases are defined in Fedora. The community then focuses on those use cases with development and maintenance, optimisation (like minimisation), and testing (like CI and gating). These use cases can be transparently prioritized for infrastructure resources based on community interests.

Feedback Pipeline actively monitors each use case and records the size and the dependencies required for it to run. Data history is kept and shown to see changes over time. And to keep things small over time, Feedback Pipeline also automatically detects size increases and potentially automatically opens Bugzilla bugs to track/fix/justify such increases transparently.

An active focus on minimization means that our maintainers produce size-optimised content with the same or lower amount of effort. Tooling, services, and data help them to make the right decision about dependencies easily, and to keep things smaller over time.

Actions

Identify relevant use cases and allow the community (meaning not just the Minimization Team) to define their own. We think of a use case as a set of packages installed in a specific context, having a specific purpouse — such as Apache HTTP Server Container. Define use cases at least for:

  • httpd
  • nginx
  • MariaDB
  • PostgreSQL
  • Fedora IoT

Collect specific use cases by talking to people at tech events, internet forums, and any other viable venues.

Extend monitoring services (Feedback Pipeline) that:

  • Visualize dependencies and a total size for each use case
  • Monitor size changes over time
  • Auto-detect large size changes
  • Notifies maintainers about unexpected size increases

Other than features, we also need to:

  • write tests to significantly simplify contribution
  • do performance optimizations for the service to scale well
  • explore the use of CI and Rawhide Gating

Being able to see what's going on is a prerequisite of implementing any changes. Seeing all the relevant opportunities helps us to focus on the ones having the most impact, and a transparent tracking helps us prove the usefulnes of our work, and to further focus on the most impactful activities.

Minimize the installation size of the use cases by optimizing RPM dependencies, features, software architecture, and other factors. Specifically, look for:

  • Unnecessary RPM dependencies (although there probably won't be many)
  • Multiple implementations of the same functionality required by various packages — try to make them use the same one
  • Context-specific requirements — such as requiring Systemd on traditional deployments being fine vs. requiring it in containers means significant size increases. Leverage weak dependencies in those cases (that might require code changes).
  • Dependnecies on large things while only using a fraction of the functionality — such as requiring the whole Perl stack to run a single script — such script can be rewritten to Python which is everywhere mostly because of DNF

Engage with upstream developers regarding bigger changes in packaging and architecture. An example is Systemd and splitting the systemd-sysuser package.

Implement process and policy changes reflecting bigger, more general changes. Again, a good example is using Systemd in containers, or the general issue of creating users in containers.

Provide guidance for the Fedora community in form of blog posts, videos, and conference talks. Even though we might have guidelines and policies in place, spreading the word is always important.

Resources and Inputs

Cloud resources to prototype services. We are not going to change the existing Fedora infrastructure in any way before whatever we develop proves useful and worth the hustle of stabilization and changing production.

No existing Fedora Infra or Release Engineering resources are needed at the moment. However, we might need help with setting up (or getting access to) the cloud resources.

Active support from our maintainers, the FPC, and other community members is definitely needed. This is obviously not something we can "request", but it's still a necessary input.

Guiding Principles

Usefulness over size: There is a balance between the usefulness and size. We take that in mind and will not implement drastic changes that would prevent our users from using Fedora. However, nothing prevents us from producing additional very specific and mininal artifacts.

Using RPM: We're doing this with RPM. We're not achieving minimization by deleting files after installation. This might be obvious, but still worth mentioning.

Accomplishments

See the status page for detailed info and historic weekly updates. Summary below.

Better understanding — Yes, we now have much better understanding of the problem and a better, more specific idea about the next steps.

Feedback Pipeline — A service that monitors use cases for size and dependencies. Includes various views in tables and interactive dependency graphs.

Systemd and containers — We dag into the issue of Systemd vs. containers, especially for packages requiring it just to create users in containers using systemd-sysuser. Working with upstream on splitting the package out. Thought about, but not yet proposed, a wider policy around this.

Policy thinking:

  • A - If systemd is only needed to start services, a package should only
    "Recommend" systemd.  This will allow containers to install the
    package without systemd.
  • B - If a program is just using a library of systemd, only require
    systemd-libs.  Example: libusb
  • C - If a package wants to use systemd-sysusers to create users/groups,
    only require systemd-sysusers.  (NOTE:  This subpackage isn't
    implemented yet)

initial-setup — If an image is built without users, there needs to be some way to add a user at startup.  initial-setup does a good job of that, but at the expense of size.  It pulls in anaconda-tui and anaconda-core.  Those two packages then commence to pull in alot of other, rather large, packages. This is for the IoT images, as well as others.
We currently do not have a recommendation, but it is being worked on.

Use pcre2 instead of pcre — The minimization effort is trying to trim
things down to just one pcre, and that is pcre2.

Polkit and mozjs60 — Let's expain this one with a terrible analogy! Polkit is this lovely person (.5M) that rings your doorbell and says they will wash the windows of your house.  After you agree, they bring out their elephant (mozjs60 30M) and use it to spray your windows with water. Polkit pulls in mozjs60, which is a rather large package. So, we're trying to sort this one out, too.

@asamalik would you mind submitting your proposal in the description as a pull request against council-docs? (see Fedora-Council/council-docs#61 as an example). Once you do that I'll publish it on the Community Blog and start The Process™

The voting period will begin on 31 October.

For reference, the Community Blog post

I'm back from vacation, I'll get back to all the feedback and incorporate whatever's possible before tomorrow lunch (US morning).

Metadata Update from @bcotton:
- Issue tagged with: ticket-vote

22 days ago

+1.

Plus an extra +1 for logic model.

That's enough votes. This is approved.

Metadata Update from @bcotton:
- Issue close_status updated to: approved
- Issue status updated to: Closed (was: Open)

5 days ago

Login to comment on this ticket.

Metadata