#1491 clarifications/improvements for new bundling policy
Closed None Opened 3 years ago by kevin.

In ticket https://fedorahosted.org/fesco/ticket/1483 fesco changed the bundling rules for Fedora packages.

Approved Proposal:

All packages whose upstreams allow them to be built against system libraries must be built against system libraries.

All packages whose upstreams have no mechanism to build against system libraries must be contacted publicly about a path to supporting system libraries. If upstream refuses, this must be recorded in the spec file using a persistent mechanism to be clarified in the packaging guidelines.

All packages whose upstreams have no mechanism to build against system libraries may opt to carry bundled libraries, but if they do, they must include Provides: bundled(<libname>) = <version> in their RPM spec file.

I'd like to ask for clarifications and improvements for various cases for this new policy:

  1. In the case that an upstream no longer exists or never replies to the public contact, should the package be disallowed? Or should it be allowed after some time period?

  2. When the Provides: naming isn't clear or already used somewhere else, who decides what to use? I'd suggest that the FPC should manage a list and settle naming disputes around these.

  3. We should define "system libraries" here. Is md5.c a 'system library'?

  4. Do these rules apply to EPEL? I would assume so since they follow the normal Fedora policies, but would be nice to note.

  5. Should we note anything about fonts? Under the old rules fonts had to be unbundled (and it wasn't usually very hard, just make sure the font is packaged and replace the bundled file with a link to the system version), but under these rules I guess it's up to the maintainer?

Feel free to punt this for the next meeting if there should be more discussion on these points. Thanks.


More generally, is bundling policy in general now going to be handled primarily by FESCo, or is it something for FPC with the new policy as a baseline? I think there are a lot of clarifications which are reasonable to at least consider; where should those go?

3 and 5 are the same question. Is a font a system library? :)

And your specific example in #3 makes me feel like we need something stronger around crypto in specific. But I don't know what or how it would work.

  1. in this case, we should enforce unbundling as we can't expect upstream maintainers to fix any CVE
    One of the difficulty is that we have no mechanism to detect or flag a dead upstream.
    It would be a nice feature to add in pkgdb.
  2. FPC did maintain such list in the past, and they have the experience
  3. my PoV is md5.c is not a system library but it should be declared by a provide
  4. yes
  5. my PoV is fonts are not concerned by the new policy, strict unbundling rules still apply
  1. Package maintainer discretion.

  2. Fine with that suggestion, though I'm not entirely sure what you mean by "already used somewhere else"

  3. md5.c is not a system library. openssl-libs (which provides libcrypto) is. If the question is "what do we do about bundled crypto code?" that should probably be split out separately into its own section.

  4. I see no reason why EPEL has to use these rules if they choose not to. EPEL already deviates from Fedora elsewhere. Leave this up to that community.

  5. I don't care either way on fonts. The impact of having them bundled is mostly limited to disk space.

  1. We can leave decision to package maintainer also but I like the suggestion of hguemar to have this also noted in pkgdb. Maybe prefix "(dead)" to existing URL tag in spec. Not sure if its allowed in pkgdb.

  2. FPC was maintaining [https://fedoraproject.org/w/index.php?title=Packaging:No_Bundled_Libraries&oldid=406058#Packages_granted_exceptions list] which is now removed. I think that list should be added back or create a new page. When Provides name is not clear, create ticket to FPC and request on deciding bundled library Provides name.

  3. md5.c is not system library.

  4. EPEL should use these rules but as jwboyer said we can leave this to EPEL community.

  5. Please have fonts strictly unbundled, don't want to see same font installed on my system in various directories.

  1. I think this is entirely at the discretion of the package maintainer (or done voluntarily by an Unbundling SIG if that materializes).
  2. I'd like to ask the FPC to formalize the language here.
  3. See tangent below.
  4. I agree that we should leave this to the EPEL community. Someone should probably open a ticket to get them into the loop, though.
  5. See tangent below.

=== Tangent ===
One thing I'd very much like to see is some automation around dealing with duplicated files. In the case of things like the fonts, these are items that are identical content; I think it makes sense to see if we can extend RPM so that it keeps a fast searchable database of sha1 hashes for all files it installs. When a file with the same hash is about to be installed, it should hardlink instead of duplicating the content. Then it becomes irrelevant if we have the same font (or license file, etc.) installed in many places.

Similarly, I'm working on a script that we can run periodically (maybe once a month or once a Fedora release, etc.) that will run through the {{{fedpkg prep}}} output of every package in dist-git and maintain a similar searchable database of duplicate files at the source level. This can be used to help us track down known cases of bundling even if the package maintainer is unaware of it.

So, I guess I wasn't very clear with question 3.

md5.c was perhaps a poorly chosen example.

Basically I want to know where along the line we consider something a 'system library'. I think there's a continuum here, but perhaps folks would prefer to just leave that up to the maintainer as well.

I guess this gets back to "what is a system library" vs a copylib. The FPC had guidelines for calling something a copylib, but our policy doesn't have any of those. ;(

Re: tanget, let me introduce you to http://pypi.python.org/pypi/summershum and the large db of checksums we already possess.

AGREED: defer ticket 1491 for next week meeting.

I suggest relaxing the requirement to contact upstream if the maintainers of the affected library instead it to be a copylib. E.g. the requirement could be satisfied by linking to a public request for the app developers to unbundle the library in the spec file, OR linking to a public statement by the library developers that the library is intended to be bundled and not built as a system library (e.g. libgd).

Replying to [comment:11 catanzaro]:

I suggest relaxing the requirement to contact upstream if the maintainers of the affected library instead it to be a copylib. E.g. the requirement could be satisfied by linking to a public request for the app developers to unbundle the library in the spec file, OR linking to a public statement by the library developers that the library is intended to be bundled and not built as a system library (e.g. libgd).

While not strictly written that way, I would consider that completely in line with the intent of the original proposal. The contact doesn't strictly have to come ''from'' a Fedora contributor as long as there is evidence to support that upstream is unable or unwilling to unbundle.

With my FPC hat on, I have the following comments:

Replying to [ticket:1491 kevin]:

All packages whose upstreams allow them to be built against system libraries must be built against system libraries.

All packages whose upstreams have no mechanism to build against system libraries must be contacted publicly about a path to supporting system libraries. If upstream refuses, this must be recorded in the spec file using a persistent mechanism to be clarified in the packaging guidelines.

I'd like to see a provision here encouraging package maintainers to attempt unbundling on their own. It's trivial in many cases and upstream may welcome a working patch submitted upon initial contact.

All packages whose upstreams have no mechanism to build against system libraries may opt to carry bundled libraries, but if they do, they must include Provides: bundled(<libname>) = <version> in their RPM spec file.

What about code fragments (e.g. single files copied verbatim or modified from another project) or content (e.g. fonts, icons)?

I'd like to ask for clarifications and improvements for various cases for this new policy:

  1. In the case that an upstream no longer exists or never replies to the public contact, should the package be disallowed? Or should it be allowed after some time period?

If upstream is dead, unresponsive or otherwise uncooperative, the package maintainer should still be encouraged to attempt unbundling on their own or contact the bundling project and maintainers from other distributions about creating a fork that does support unbundling.

  1. When the Provides: naming isn't clear or already used somewhere else, who decides what to use? I'd suggest that the FPC should manage a list and settle naming disputes around these.

+1. We do have the experience in this area.

  1. We should define "system libraries" here. Is md5.c a 'system library'?

We haven't been able to come up with a strict definition, unfortunately. That particular example is a copylib (though there really should be a canonical implementation in glibc). But some libraries declared by upstream as copylibs are in our opinion too big and should be remade into proper shared libraries. So I wouldn't say "upstream is always right" here.

  1. Do these rules apply to EPEL? I would assume so since they follow the normal Fedora policies, but would be nice to note.

Definitely.

  1. Should we note anything about fonts? Under the old rules fonts had to be unbundled (and it wasn't usually very hard, just make sure the font is packaged and replace the bundled file with a link to the system version), but under these rules I guess it's up to the maintainer?

There usually is no technical reason for bundling fonts, so the new rules should be adjusted accordingly.

Postponed because neither nirik nor sgallagh were preset at today's meeting. Let's continue here.

Why are you only considering such minor wording adjustments to the broken new (non-)policy instead of just restoring the "old" policy that worked and that the people who actually understand the matter (FPC) want back?

Replying to [comment:13 rathann]:

With my FPC hat on, I have the following comments:

Replying to [ticket:1491 kevin]:

All packages whose upstreams allow them to be built against system libraries must be built against system libraries.

All packages whose upstreams have no mechanism to build against system libraries must be contacted publicly about a path to supporting system libraries. If upstream refuses, this must be recorded in the spec file using a persistent mechanism to be clarified in the packaging guidelines.

I'd like to see a provision here encouraging package maintainers to attempt unbundling on their own. It's trivial in many cases and upstream may welcome a working patch submitted upon initial contact.

This statement isn't proscriptive. It describes the minimum that must be done, not the maximum. As a general rule, it's best to avoid the word "should"-type rules in the guidelines. It makes it harder to parse what is actually necessary.

That said, I'm open to adding something like this to the "SHOULD" checklist on https://fedoraproject.org/wiki/Packaging:ReviewGuidelines (which also needs to be updated to drop the no-bundled-libs MUST line).

All packages whose upstreams have no mechanism to build against system libraries may opt to carry bundled libraries, but if they do, they must include Provides: bundled(<libname>) = <version> in their RPM spec file.

What about code fragments (e.g. single files copied verbatim or modified from another project) or content (e.g. fonts, icons)?

In terms of fonts and icons (stuff that actually gets delivered to the installed system), I'm looking into ways that we can detect and hardlink that at the RPM layer to avoid duplication. As for code fragments, that's an unbounded problem. There are hundreds of thousands of code fragments in the existing packages that are untrackable. We have some tools in place to start trying to detect duplicate files in the source packages and we'll do some reporting, but I'm not sure if it makes sense to flood the Provides with that information; maybe we can look at maintaining a separate database.

I'd like to ask for clarifications and improvements for various cases for this new policy:

  1. In the case that an upstream no longer exists or never replies to the public contact, should the package be disallowed? Or should it be allowed after some time period?

If upstream is dead, unresponsive or otherwise uncooperative, the package maintainer should still be encouraged to attempt unbundling on their own or contact the bundling project and maintainers from other distributions about creating a fork that does support unbundling.

As above, unbundling remains recommended. The only change here is that it is no longer mandated. If you have ideas about how to communicate this better, please let me know. I'm unsure that a lot of "But you ''should'' do this" in the guidelines would do anything but make it harder for people to identify what the strict requirements are.

  1. When the Provides: naming isn't clear or already used somewhere else, who decides what to use? I'd suggest that the FPC should manage a list and settle naming disputes around these.

+1. We do have the experience in this area.

Thanks, we would appreciate that.

  1. We should define "system libraries" here. Is md5.c a 'system library'?

We haven't been able to come up with a strict definition, unfortunately. That particular example is a copylib (though there really should be a canonical implementation in glibc). But some libraries declared by upstream as copylibs are in our opinion too big and should be remade into proper shared libraries. So I wouldn't say "upstream is always right" here.

Yeah, this is one of those cases where I think it will be best to start processing summershum more completely. We can probably use that to feed a Provides: bundled(md5-foo-implementation). I do think for cases like crypto/hashes that this is really important.

  1. Do these rules apply to EPEL? I would assume so since they follow the normal Fedora policies, but would be nice to note.

Definitely.

I'm slightly in favor of asking EPSCo to make a decision here. It's perfectly within their rights (in my opinion) to require a stricter policy than Fedora if they wish.

  1. Should we note anything about fonts? Under the old rules fonts had to be unbundled (and it wasn't usually very hard, just make sure the font is packaged and replace the bundled file with a link to the system version), but under these rules I guess it's up to the maintainer?

There usually is no technical reason for bundling fonts, so the new rules should be adjusted accordingly.

I think the auto-detection and hardlinking of fonts in RPM would be a better long-term solution than trying to unbundle them.

I also have an additional item to bring up in this thread. I was talking with Jan Zeleny (CCed) recently. He was asking about how to handle automatic Requires: and Provides: when dealing with bundled libraries:

Jan Zeleny wrote:

Basically we were talking about bundling at one of our Software Management
meetings and we discovered that rpm automatic dependency generators combined
with bundled libraries may send Fedora into new dependency hell. Let me give
you a specific example. When you bundle a library, rpm will generate the
corresponding provide, for example libpng16.so.16()(64bit) and append it to
the list of provides of the rpm package.

Once that package is installed on your system, I can imagine four different
situations related to installation of any package depending on
libpng16.so.16()(64bit), based on whether the bundled library is in one of the
standard linker paths and whether or not the system libpng is installed or
not. Sufficient to say that three of those four situations might render any
application with such a dependency unstable.

My response:

Certainly, the coding guidelines
need to be updated. First of all, we need to state unequivocally that
only a version of a library that is intended to be system-wide may be
allowed in the standard linker paths (pretty much just {{{/usr/lib[64]}}}
now). Any library carried as a bundle should be required to live in
{{{/usr/lib/<pkgname>/libfoo.so.0.1.2}}} and we need to modify the RPM
autoprovides such that only the direct contents of {{{/usr/lib[64]}}} are
eligible.

Those two changes should, I think, cover the whole problem. Let me
know if I'm missing something obvious. If that sounds right to you,
I'll work up a modification to the guidelines and submit it to FPC.

Jan seemed to think that this was probably the right answer and is going to look into it on his side. Thoughts?

I've added a page for the current policy:

https://fedoraproject.org/wiki/Bundled_Software_policy

We can discuss changes/clarifications at the meeting.

Amendments approved at today's fesco meeting:

Unresolved issues include whether to make an explicit ruling about font data, and the precise definition of "system library". As no concrete proposals for those were forthcoming, please file new tickets for those issues if you have a suggestion.

Login to comment on this ticket.

Metadata