#64 Next phase of Minimization Objective
Merged 3 days ago by bcotton. Opened a month ago by asamalik.
Fedora-Council/ asamalik/council-docs phase-2-proposal  into  master

@@ -33,7 +33,10 @@ 

  -

  

  Objective Details::

- See the xref:minimization::index.adoc[Fedora Minimization objective] space including the xref:minimization::action-plan.adoc[Action Plan].

+ 

+ * **First Phase**: See the xref:minimization::index.adoc[Fedora Minimization objective] space including the xref:minimization::action-plan.adoc[Action Plan].

+ * **Second Phase Proposal**: xref:objectives/minimization-phase-2.adoc[Minimization Phase 2 Proposal]

+ 

  

  Documentation::

  Will be appearing in the xref:minimization::index.adoc[Fedora Minimization objective] space.

@@ -0,0 +1,135 @@ 

+ = Fedora Minimization Objective

+ 

+ == Second Phase Proposal

+ 

+ Objective lead: Adam Samalik (asamalik)

+ 

+ === The problem

+ 

+ While Fedora is well suited for traditional physical/virtual workstations and servers, it is often overlooked for use cases beyond traditional installs.

+ 

+ Some modern types of deployments — such as IoT and containers — are quite sensitive to size. For IoT that's usually slow data connections (for updates/management) and for cloud and containers it's the massive scale.

+ 

+ A specific example is Systemd — while being very useful (everybody loves Systemd) and is always present on physical systems, it is rarely needed in containers. So it wasn't a problem for packages to require Systemd just for __systemd-sysusers__ to create users. However, in containers, that means a significant size increase.

+ 

+ Besides that, basically all types of deployments benefit from a reduced size, as there is a direct relationship between the installation footprint and attacks surface & relevant CVEs.

+ 

+ === Vision

+ 

+ Thousands of individual and corporate contributors collaborate in the Fedora community to explore new problems and to build a fast-moving modern OS with a rich ecosystem allowing them to experiment on modernising their infrastructure.

+ 

+ === Mission

+ 

+ Helping open source developers, sysadmins, and Linux distribution maintainers to focus on what's relevant for them.

+ 

+ === Outcomes

+ 

+ Fedora is a popular platform because its ecosystem is both cutting-edge and well optimized for modern deployments such as IoT and containers. That makes many people use Fedora rather than to build and assemble their own artifacts directly from upstream projects. And that relieves the pressure on open source developers caused by users who would otherwise ask for their specific security and other issues to be fixed quickly.

+ 

+ So:

+ 

+ * Open source developers can focus on feature development

+ * Sysadmins can easily consume pre-built bits that also get regular updates

+ * Fedora contributors (vendors and individuals) can collaborate within the Fedora community on exploring and developing open source solutions to problems of the future

+ 

+ === Outputs

+ 

+ Specific use cases are defined in Fedora. The community then focuses on those use cases with development and maintenance, optimisation (like minimisation), and testing (like CI and gating). These use cases can be transparently prioritized for infrastructure resources based on community interests.

+ 

+ Feedback Pipeline actively monitors each use case and records the size and the dependencies required for it to run. Data history is kept and shown to see changes over time. And to keep things small over time, Feedback Pipeline also automatically detects size increases and potentially automatically opens Bugzilla bugs to track/fix/justify such increases transparently.

+ 

+ An active focus on minimization means that our maintainers produce size-optimised content with the same or lower amount of effort. Tooling, services, and data help them to make the right decision about dependencies easily, and to keep things smaller over time.

+ 

+ === Actions

+ 

+ **Identify relevant use cases** and allow the community (meaning not just the Minimization Team) to define their own. We think of a use case as a set of packages installed in a specific context, having a specific purpouse — such as __Apache HTTP Server Container__. Define use cases at least for:

+ 

+ * __httpd__

+ * __nginx__

+ * __MariaDB__

+ * __PostgreSQL__

+ * __Fedora IoT__

+ * __Python 3__

+ 

+ Also, consider looking at container-native use cases, such as:

+ 

+ * __GO for container apps__

+ * __Rust for container apps__

+ * __Quarkus__

+ 

+ Collect specific use cases by talking to people at tech events, internet forums, and any other viable venues. 

+ 

+ **Extend monitoring services** (Feedback Pipeline) that:

+ 

+ * Visualize dependencies and a total size for each use case

+ * Monitor size changes over time

+ * Auto-detect large size changes

+ * Notifies maintainers about unexpected size increases

+ 

+ Other than features, we also need to:

+ 

+ * write tests to significantly simplify contribution

+ * do performance optimizations for the service to scale well

+ * explore the use of CI and Rawhide Gating

+ 

+ Being able to see what's going on is a prerequisite of implementing any changes. Seeing all the relevant opportunities helps us to focus on the ones having the most impact, and a transparent tracking helps us prove the usefulnes of our work, and to further focus on the most impactful activities.

+ 

+ **Minimize** the installation size of the use cases by optimizing RPM dependencies, features, software architecture, and other factors. Specifically, look for:

+ 

+ * Unnecessary RPM dependencies (although there probably won't be many)

+ * Multiple implementations of the same functionality required by various packages — try to make them use the same one

+ * Context-specific requirements — such as requiring Systemd on traditional deployments being fine vs. requiring it in containers means significant size increases. Leverage weak dependencies in those cases (that might require code changes).

+ * Dependnecies on large things while only using a fraction of the functionality — such as requiring the whole Perl stack to run a single script — such script can be rewritten to Python which is everywhere mostly because of DNF

+ 

+ **Engage with upstream developers** regarding bigger changes in packaging and architecture. An example is Systemd and splitting the __systemd-sysuser__ package.

+ 

+ **Implement process and policy changes** reflecting bigger, more general changes. Again, a good example is using Systemd in containers, or the general issue of creating users in containers.

+ 

+ **Provide guidance** for the Fedora community in form of blog posts, videos, and conference talks. Even though we might have guidelines and policies in place, spreading the word is always important.

+ 

+ === Resources and Inputs

+ 

+ Cloud resources to prototype services. We are not going to change the existing Fedora infrastructure in any way before whatever we develop proves useful and worth the hustle of stabilization and changing production.

+ 

+ No existing Fedora Infra or Release Engineering resources are needed at the moment. However, we might need help with setting up (or getting access to) the cloud resources.

+ 

+ Active support from our maintainers, the FPC, and other community members is definitely needed. This is obviously not something we can "request", but it's still a necessary input.

+ 

+ === Guiding Principles

+ 

+ **Usefulness over size**: There is a balance between the usefulness and size. We take that in mind and will not implement drastic changes that would prevent our users from using Fedora. However, nothing prevents us from producing additional very specific and mininal artifacts.

+ 

+ **Using RPM**: We're doing this with RPM. We're not achieving minimization by deleting files after installation. This might be obvious, but still worth mentioning.

+ 

+ 

+ == First Phase Accomplishments

+ 

+ See the https://docs.fedoraproject.org/en-US/minimization/status/[status page] for detailed info and historic weekly updates. Summary below.

+ 

+ **Better understanding** — Yes, we now have much better understanding of the problem and a better, more specific idea about the next steps.

+ 

+ https://minimization.github.io/reports/[**Feedback Pipeline**] — A service that monitors use cases for size and dependencies. Includes various views in tables and interactive dependency graphs.

+ 

+ **Systemd and containers** — We dag into the issue of Systemd vs. containers, especially for packages requiring it just to create users in containers using __systemd-sysuser__. Working with upstream on splitting the package out. Thought about, but not yet proposed, a wider policy around this.

+ 

+ * Mailing list discussion: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/6YX6CEFBPU3XVZZEHTN6CBH2F7JDF35N/#EJD4BNBE52JTEOPKAT6HFOO4HVUPBTCH

+ * Ticket: https://pagure.io/minimization/issue/13

+ 

+ Policy thinking:

+ 

+ * A - If systemd is only needed to start services, a package should only

+ "Recommend" systemd.  This will allow containers to install the

+ package without systemd.

+ * B - If a program is just using a library of systemd, only require

+ systemd-libs.  Example: libusb

+ * C - If a package wants to use systemd-sysusers to create users/groups,

+ only require systemd-sysusers.  (NOTE:  This subpackage isn't

+ implemented yet)

+ 

+ **initial-setup** — If an image is built without users, there needs to be some way to add a user at startup.  initial-setup does a good job of that, but at the expense of size.  It pulls in anaconda-tui and anaconda-core.  Those two packages then commence to pull in alot of other, rather large, packages. This is for the IoT images, as well as others.

+ We currently do not have a recommendation, but it is being worked on.

+ 

+ **Use pcre2 instead of pcre** — The minimization effort is trying to trim 

+ things down to just one pcre, and that is pcre2.

+ 

+ **Polkit and mozjs60** — Let's expain this one with a terrible analogy! Polkit is this lovely person (.5M) that rings your doorbell and says they will wash the windows of your house.  After you agree, they bring out their elephant (mozjs60 30M) and use it to spray your windows with water. Polkit pulls in mozjs60, which is a rather large package. So, we're trying to sort this one out, too.

Proposing the second phase of the Minimization objective. Council ticket: https://pagure.io/Fedora-Council/tickets/issue/274

There is a patch set that makes polkit able to use INI/TOML style config files for rules instead of JavaScript. It was written by Ikey Doherty for Solus and Clear Linux, and would be worth reviving, updating, and upstreaming. That would allow eliminating the mandatory JS dependency.

I think this is a great goal to have but minimization is the wrong objective for that. People are still building their own artifacts using small container base images like Alpine. If we want to change that I think this is more of a marketing/public speaking initiative that needs to happen to explain why someone should use the fedora postgresql container image instead of the official postgresql image in DockerHub for example. In my opinion making Fedora smaller will not make people want to use the Fedora image over the one available in DockerHub.

I would love to see more cloud native use cases, like how do we do with building Go/Rust applications, how do we use quarkus.io ect ... I think this is where Fedora can make a difference in this space.
So maybe some use cases would be Go and Rust image that allow you to compile your application, a JDK application. We also have a strong Python community so having a Python base image use case would be cool.

Minimization could definitely be part of making more appealing for upstreams to consider Fedora over Alpine, Debian, or Ubuntu for official app/service images. But I'm unsure if minimization is what they want or need for that. We'd need to conduct some outreach and survey developers of those images to get a better picture of what would make them consider Fedora over alternatives.

Minimization could definitely be part of making more appealing for upstreams to consider Fedora over Alpine, Debian, or Ubuntu for official app/service images. But I'm unsure if minimization is what they want or need for that. We'd need to conduct some outreach and survey developers of those images to get a better picture of what would make them consider Fedora over alternatives.

+1 for me this would be a bigger objective and minimization would only be a part of it. For example the Official Python:3.8 image on DockerHub is 955MB and it is using Debian it would be awesome to have it run on Fedora and have a smaller image. I think that should be one of the focus use cases for minimization.

There is a patch set that makes polkit able to use INI/TOML style config files for rules instead of JavaScript. It was written by Ikey Doherty for Solus and Clear Linux, and would be worth reviving, updating, and upstreaming. That would allow eliminating the mandatory JS dependency.

After looking at the proposed patchset for INI/TOML, it either breaks all javascript rules, or it adds to the dependency list. There is another proposal to use duktape instead of mozjs. That would be able to preserve the current javascript uses, and trim things down. duktape is under 200 KB, and doesn't have all the baggage that mozjs has.

Thanks all for the feedback. I'm back from over-a-week-long vacation, will get back to every comment very shortly!

There is a patch set that makes polkit able to use INI/TOML style config files for rules instead of JavaScript. It was written by Ikey Doherty for Solus and Clear Linux, and would be worth reviving, updating, and upstreaming. That would allow eliminating the mandatory JS dependency.

@ngompa Thanks for the tip. Short-term, I'm inclined to try to do what @tdawson proposes to maintain compatibility with existing configs.

I think this is a great goal to have but minimization is the wrong objective for that. People are still building their own artifacts using small container base images like Alpine. If we want to change that I think this is more of a marketing/public speaking initiative that needs to happen to explain why someone should use the fedora postgresql container image instead of the official postgresql image in DockerHub for example. In my opinion making Fedora smaller will not make people want to use the Fedora image over the one available in DockerHub.

Minimization could definitely be part of making more appealing for upstreams to consider Fedora over Alpine, Debian, or Ubuntu for official app/service images. But I'm unsure if minimization is what they want or need for that. We'd need to conduct some outreach and survey developers of those images to get a better picture of what would make them consider Fedora over alternatives.

@cverna @ngompa I agree with both of you that the Minimization work is not everything we need to do to fulfil that outcome. But maybe we could use it as a talking point? or as a conversation starter? And then we could dig into more details how Linux distros, and especially Fedora, can benefit to containers.

I agree that this outcome will be more dependent on public speaking, online content, and also just talking to people at tech events. Would you be interested to talk about / help with that?

I wonder if we should request budget for community (or just team) members to go to events specifically for this purpose as a part of this objective. I'm sure many of us will be at DevConf, FOSDEM, Flock, ... but perhaps we should focus on container-oriented events as that's the most relevant audience I feel.

I would love to see more cloud native use cases, like how do we do with building Go/Rust applications, how do we use quarkus.io ect ... I think this is where Fedora can make a difference in this space.
So maybe some use cases would be Go and Rust image that allow you to compile your application, a JDK application. We also have a strong Python community so having a Python base image use case would be cool.

That's a good point, let me add Go, Rust, and Quarkus to the list. At least to look at.

With Python, my thinking was that there is Python in every Fedora image because of DNF, but maybe "having Python" vs. "being useful to deploy Python apps" is not always the same. So let me add Python there as well.

Minimization could definitely be part of making more appealing for upstreams to consider Fedora over Alpine, Debian, or Ubuntu for official app/service images. But I'm unsure if minimization is what they want or need for that. We'd need to conduct some outreach and survey developers of those images to get a better picture of what would make them consider Fedora over alternatives.

+1 for me this would be a bigger objective and minimization would only be a part of it. For example the Official Python:3.8 image on DockerHub is 955MB and it is using Debian it would be awesome to have it run on Fedora and have a smaller image. I think that should be one of the focus use cases for minimization.

Huh, 955MB for a Python runtime seems quite big? So yes, let's focus on that one specifically, too.

1 new commit added

  • add container-native use cases
23 days ago

@cverna @ngompa About outreach, @bex has suggested to me that this would be a great marketing/talking point for Fedora. Maybe we should reach out to the marketing team to figure something out.

So from the few conversations I have when going to different developers meetup in my area. It is really hard for people to understand why they should consume a layered image (ie postgresql) from the fedora registry rather than taking the official image from Docker Hub. By experience you can talk to people about security, the fact that we update our images etc still most of the time developers are using containers in their development environment and they don't really care about that.

For them creating a container by FROM python:3.7 rather than FROM registry.fedoraproject.org/f31/python:latest is going through some extra typing for not much benefit.

For containers that run in production, the general trend I see, in these meetup and or devopsdays conferences is that people generally build their own containers and build a custom pipeline to keep them updated. So in general they just use an alpine base image, or more and more just do a FROM scratch and drop the binary from their golang or rust project.

I think selling Fedora as a platform to build containers from is not an easy task, at least to me :-)

Approved by Council. Merging.

Pull-Request has been merged by bcotton

3 days ago