#714 let's kill file deps!
Closed: accepted 10 months ago Opened 2 years ago by mattdm.

Proposal:

RPM and our higher-level packaging tools support arbitrary file-level dependencies. For example: Requires: /usr/share/inkscape/tutorials/tux.png — since that file is provided by Inkscape, the inkscape package will be brought in to fill the requirement.

This is convenient, but has significant metadata overhead, often for very little concrete benefit. Packages should simply use package-level dependencies (Requires: inkscape) wherever possible; in the rare cases where it is not possible, please make a note as to the problem, which may be solved by packaging changes in the required package (contact the maintainer of that package to see what can be worked out) or in the future with rich Boolean dependencies.

Note that if file-based dependencies are unavoidable, /usr/bin and /usr/sbin should be used instead of /bin and /sbin, due to UsrMove.

Also note that due to special-casing in Yum, dependencies outside of the bin dirs are significantly more overhead; these really should be avoided.

Notes:

  1. The most useful end-user use for file-based deps is dnf install /usr/share/inkscape/tutorials/tux.png and similar. But that's separate from this problem, and could be solved in an entirely different way.
  2. If we want to go as far as banning deps outside of /usr/bin and /usr/sbin, that'd be awesome.
  3. Or for that matter, banning them outright (with some sort of transition period, perhaps).

Will cite myself from issue where it was originally filed:

SUSE is doing this for long time as well. AFAIK createrepo_c adds files in primary.xml which are executable automagically.

I'm pretty sure @ngompa has insight on this ;)

I absolutely don't want to see file dependencies banned. The opposite, in fact, I want to see them used more.

Fedora is not granular enough in terms of how things are packaged, and in addition, file deps transcend distribution differences (or even differences in packaging across releases in the same distribution).

Also, yes, createrepo_c already makes common file dependency paths available in the primary.xml data, originally because Yum and other older solvers didn't efficiently handle processing the full set of metadata.

Today, this is no longer the case with libsolv-based resolvers. Both DNF and Zypper can efficiently process the entire set of metadata. DNF even recently gained support for parallel download of metadata content to speed up data download and processing. If the problem is that the incremental downloads are too large, push the DNF team to complete support for delta repodata downloads, as that's a hugely beneficial feature for everyone.

Frankly, it's rather dumb to not leverage an amazing aspect of RPM dependencies. If it were possible, I'd even like to see file deps support versioning (which is the only real weakness it has today).

For example, I use it in the Fedora libzypp package to ensure that the requisite tools from libsolv are pulled in, and prevent people from accidentally removing functionality without getting "broken dependencies" errors from the buildsystem repoclosure. :wine_glass:

So please, don't do this.

Well, let me share something from conversations with mls (developer of libsolv)..

(03:12:58 PM) mls: you should get rid of that filelist, SUSE manages perfectly without it ;)
(03:14:27 PM) mls: you can only add requires to stuff in /usr/bin and /etc, i.e. the files from the primary xml
(03:14:30 PM) ignatenkobrain: or `dnf install /usr/share/info/foo.info.gz`
(03:14:31 PM) mls: (by policy)
(03:14:55 PM) mls: legacy is a bitch ;)

So honestly, I would like to get rid out of filelists completely and require it whenever user wants to lookup them (e.g. repoquery).

You will be surprised with current state of spec files:

389-dsgw.spec:Requires:         /etc/dirsrv/admin-serv/httpd.conf
bacula2.spec:BuildRequires: /usr/include/tcpd.h
certmonger.spec:BuildRequires:  /usr/include/popt.h
ceph.spec:BuildRequires:    /usr/share/selinux/devel/policyhelp
clamav.spec:Requires:   /etc/cron.d
clamav.spec:Requires:   /etc/init
clamav.spec:Requires:   /etc/init
darkhttpd.spec:Requires:       /etc/mime.types
dist-git.spec:BuildRequires:  /usr/share/selinux/devel/policyhelp
exim.spec:Requires: /etc/pki/tls/certs /etc/pki/tls/private
exim.spec:Requires: /etc/aliases
gogoc.spec:BuildRequires:  /usr/share/selinux/devel/policyhelp
httpd.spec:Requires: /etc/mime.types, system-logos-httpd
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSerif-Regular.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSerif-Bold.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSerif-Italic.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSerif-BoldItalic.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSans-Regular.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSans-Bold.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSans-Italic.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationSans-BoldItalic.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationMono-Regular.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationMono-Bold.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationMono-Italic.ttf
Inventor.spec:BuildRequires: /usr/share/fonts/liberation/LiberationMono-BoldItalic.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSerif-Regular.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSerif-Bold.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSerif-Italic.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSerif-BoldItalic.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSans-Regular.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSans-Bold.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSans-Italic.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationSans-BoldItalic.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationMono-Regular.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationMono-Bold.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationMono-Italic.ttf
Inventor.spec:Requires: /usr/share/fonts/liberation/LiberationMono-BoldItalic.ttf
initscripts.spec:Requires: /etc/system-release
jasmine-node.spec:Requires:       /usr/share/jasmine/jasmine.js
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/Vera.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraBI.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraBd.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraIt.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraMoBI.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraMoBd.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraMoIt.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraMono.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraSe.ttf
k3d.spec:BuildRequires: /usr/share/fonts/bitstream-vera/VeraSeBd.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/Vera.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraBI.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraBd.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraIt.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraMoBI.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraMoBd.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraMoIt.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraMono.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraSe.ttf
k3d.spec:Requires: /usr/share/fonts/bitstream-vera/VeraSeBd.ttf
krb5.spec:Requires: /etc/crypto-policies/back-ends/krb5.config
krb5.spec:Requires: /usr/share/dict/words
nodejs-require-cs.spec:Requires:       /usr/share/javascript/coffee-script/coffee-script.js
perl-Convert-Color.spec:BuildRequires:  /usr/share/X11/rgb.txt
perl-Crypt-SSLeay.spec:BuildRequires:  /etc/pki/tls/certs/ca-bundle.crt
perl-Crypt-SSLeay.spec:Requires:       /etc/pki/tls/certs/ca-bundle.crt
rt.spec:Requires:  /usr/share/fonts/google-droid/DroidSansFallback.ttf
rt.spec:Requires:  /usr/share/fonts/google-droid/DroidSans.ttf
rt.spec:BuildRequires:  /usr/share/fonts/google-droid/DroidSansFallback.ttf
rt.spec:BuildRequires:  /usr/share/fonts/google-droid/DroidSans.ttf
sblim-cmpi-fsvol.spec:Requires:       /etc/ld.so.conf.d
sblim-cmpi-nfsv3.spec:Requires:       /etc/ld.so.conf.d
sblim-cmpi-nfsv4.spec:Requires:       /etc/ld.so.conf.d
sblim-cmpi-syslog.spec:Requires:       /etc/ld.so.conf.d
sdljava.spec:Requires:       /usr/share/fonts/dejavu/DejaVuSans.ttf
sdljava.spec:Requires:       /usr/share/fonts/dejavu/DejaVuSans-Bold.ttf
sdljava.spec:Requires:       /usr/share/fonts/dejavu/DejaVuSans-Oblique.ttf
sdljava.spec:Requires:       /usr/share/fonts/dejavu/DejaVuSans-BoldOblique.ttf
syslinux.spec:BuildRequires: /usr/include/gnu/stubs-32.h
tix.spec:Requires: /etc/ld.so.conf.d
totpcgi.spec:BuildRequires: /usr/share/selinux/devel/policyhelp
sng.spec:BuildRequires:  /usr/share/X11/rgb.txt
sng.spec:Requires:       /usr/share/X11/rgb.txt
wraplinux.spec:BuildRequires:  /usr/include/gnu/stubs-32.h
xblast.spec:Requires:       /usr/share/fonts/dejavu/DejaVuSans.ttf
xen.spec:BuildRequires: /usr/include/gnu/stubs-32.h
tarantool.spec:Requires: /etc/protocols
tarantool.spec:Requires: /etc/services

(excluding things like /bin, /usr/bin, /usr/sbin, /sbin and /usr/lib)

/etc is fine, so we have fonts required by filename and few header files required by name. I don't really consider this enough to vote against given "(03:08:49 PM) mls: (OTOH there is this memory consumption issue when converting the filelist data)".

Also important thing to note is once we will get rid out of filelists, DNF would be faster by ~10 seconds on rawhide when loading new repo (e.g. after refreshing cache).

With all due respect, Mathew, but i could not disagree more with what you say.

File deps play an important role in packaging, because they
a) are more flexible than package deps and do not break when a file changes owner
b) reflect reality of what is being used inside of packages.
...

Of cause this adds time to dnf, but for reason. If dnf has performance issue with this, dnf has a design problem.

@corsepiu how many examples (for real packages) do you have? Can you share them?

@ignatenkobrain With all due respect, I think Michael Schroeder is wrong about file deps. And the only reason SUSE doesn't use them as much as they ordinarily would is because perl-BSSolv does not generate file dependency information by default. Thus, every file dependency must be manually specified in the OBS project configuration, and they're ignored in OBS dependency resolution in some cases (which is actually a bug).

It's the singular weakness of OBS and something that's actually in his power to fix, if he actually wanted to.

@ngompa the point is that for rawhide it takes ~ 10 seconds to create cache for files and each file lookup is incredibly slow. It is useful for things like repoquery, but for solver it is nightmare actually.

As you've seen above, only small number of packages use file deps. I'm pretty sure if we will add way more file deps in packages, dnf will slow down.

Personally I don't see use-case for requiring some random file on filesystem. Do you? (I mean real use-cases)

In rawhide today, there are 490 different file reqs. Of these, 79 are in /bin or /sbin, 323 are in /usr/bin or /usr/sbin, and 25 are in /etc. The remaining 64 are:

/usr/lib/libc.so
/usr/lib/node_modules/require-cs/cs.js
/usr/lib/node_modules/requirejs/bin/r.js
/usr/lib/ocf/resource.d
/usr/lib/udev/rules.d
/usr/lib64/httpd/modules/mod_lcgdm_dav.so
/usr/lib64/libnssckbi.so
/usr/libexec/koschei/koschei-admin
/usr/libexec/platform-python
/usr/libexec/platform-python3.6
/usr/libexec/rpm-ostreed
/usr/libexec/strongswan/charon-nm
/usr/libexec/system-python
/usr/share/X11/rgb.txt
/usr/share/aclocal
/usr/share/dict/words
/usr/share/fonts/bitstream-vera/Vera.ttf
/usr/share/fonts/bitstream-vera/VeraBI.ttf
/usr/share/fonts/bitstream-vera/VeraBd.ttf
/usr/share/fonts/bitstream-vera/VeraIt.ttf
/usr/share/fonts/bitstream-vera/VeraMoBI.ttf
/usr/share/fonts/bitstream-vera/VeraMoBd.ttf
/usr/share/fonts/bitstream-vera/VeraMoIt.ttf
/usr/share/fonts/bitstream-vera/VeraMono.ttf
/usr/share/fonts/bitstream-vera/VeraSe.ttf
/usr/share/fonts/bitstream-vera/VeraSeBd.ttf
/usr/share/fonts/dejavu
/usr/share/fonts/dejavu/DejaVuSans-Bold.ttf
/usr/share/fonts/dejavu/DejaVuSans-BoldOblique.ttf
/usr/share/fonts/dejavu/DejaVuSans-Oblique.ttf
/usr/share/fonts/dejavu/DejaVuSans.ttf
/usr/share/fonts/google-droid/DroidSans.ttf
/usr/share/fonts/google-droid/DroidSansFallback.ttf
/usr/share/fonts/liberation/LiberationMono-Bold.ttf
/usr/share/fonts/liberation/LiberationMono-BoldItalic.ttf
/usr/share/fonts/liberation/LiberationMono-Italic.ttf
/usr/share/fonts/liberation/LiberationMono-Regular.ttf
/usr/share/fonts/liberation/LiberationSans-Bold.ttf
/usr/share/fonts/liberation/LiberationSans-BoldItalic.ttf
/usr/share/fonts/liberation/LiberationSans-Italic.ttf
/usr/share/fonts/liberation/LiberationSans-Regular.ttf
/usr/share/fonts/liberation/LiberationSerif-Bold.ttf
/usr/share/fonts/liberation/LiberationSerif-BoldItalic.ttf
/usr/share/fonts/liberation/LiberationSerif-Italic.ttf
/usr/share/fonts/liberation/LiberationSerif-Regular.ttf
/usr/share/jasmine/jasmine.js
/usr/share/javascript/coffee-script/coffee-script.js
/usr/share/lightsquid/common.pl
/usr/share/sqlninja/backscan.pl
/usr/share/sqlninja/bruteforce.pl
/usr/share/sqlninja/dirshell.pl
/usr/share/sqlninja/dns.pl
/usr/share/sqlninja/escalation.pl
/usr/share/sqlninja/fingerprint.pl
/usr/share/sqlninja/getdata.pl
/usr/share/sqlninja/icmp.pl
/usr/share/sqlninja/metasploit.pl
/usr/share/sqlninja/resurrectxp.pl
/usr/share/sqlninja/revshell.pl
/usr/share/sqlninja/session.pl
/usr/share/sqlninja/sqlcmd.pl
/usr/share/sqlninja/test.pl
/usr/share/sqlninja/upload.pl
/usr/share/sqlninja/utils.pl

These are requested by 31 different packages. I am super-skeptical that these usages are actually taking advantage of any possible flexibility the feature grants.

(I'd love to see examples from /bin and friends, too.)

For this, we are adding over 8 million (unique) data points to the data set. Specifically, 8087419 file entries total, with 8040659 if we exclude the bin dirs and /etc. This seems an excessive price to pay for theoretically flexibility.

For comparison, the regular provides (including explicit file provides I'm too lazy to filter out) are only 376796.

I don't think deltas really solve this effectively; we'd have to generate a very large number of potential deltas to account for non-daily updates, and it does nothing for initial speed.

I generally come down on the side of getting rid of file dependencies. I recall how they worked in yum and the useful optimization it gave.

However, one thing strikes me: if it's really only those 64 file deps needed by only 31 packages, then why is the common case ever taking any kind of hit for this? I recall back during the cutover from yum to dnf that there was no plan to ever implement the "don't download the file lists unless necessary" optimization. So what would any of this actually save unless dnf is going to change?

Basically, please, let's get some hard numbers on what making this change would actually give us. Because to me it seems like that's going to be pretty small. But if there's a firm commitment for dnf to download filelists only when needed, and we would save some nontrivial CPU time and significant bandwidth, then it's certainly worth talking about.

For me the most important thing is the 65MB of metadata I have to download (after dnf clean all) just to see if there's anything for this F26 machine to update. It's annoying over crap hotel wireless but just imagine what that costs when you're not in a world of free and plentiful bandwidth. We owe it to people in those parts of the world to consider such things even though we have quad core CPUs and minimum 100Mbps to our homes, even if it means that, yeah, you have to fix up a package when a file moves from one subpackage to another.

Metadata Update from @tibbs:
- Issue tagged with: meeting

2 years ago

I recall back during the cutover from yum to dnf that there was no plan to ever implement the "don't download the file lists unless necessary" optimization. So what would any of this actually save unless dnf is going to change?

There has never been any pressure to find a way to do this. It is technically possible to download select metadata (we do this for PackageKit where we additionally request AppStream metadata if it exists in the repodata), but the difference is that we don't (currently) have a way to issue a metadata callback to librepo when we encounter a file dep that isn't in primary.xml. And after said callback, the sack has to be reset and the solver must re-run. Not necessarily impossible, but not trivial either.

So right now, we download primary.xml.gz and filelists.xml.gz, which totals about 53MB right now in gzip compressed form. A simple optimization to reduce the size would be to switch from gzip to bzip2 or xz compression, which cuts it nearly in half, last I checked. We could also elect to implement support for one of the new compression algorithms (brotli, zstd, etc.) that makes it faster to do better compression. If we cut the production of the SQLite metadata, we can allocate some of that time towards heavier compression options.

There are also optimization techniques at build time we can do (auto-remove Requires that match Provides in the same package) that would reduce the size of the metadata, and simplify solution solving.

I do not want the usage of file dependencies forbidden, as they are very handy, but it would make sense to explore ways to improve the experience. Even @ignatenkobrain's comment is more or less just asking for the filelists.xml.gz file to be downloaded on demand. I firmly believe it is possible to correct this deficiency if it's really desired.

That said, we should also have delta repodata downloads, too. :)

There has never been any pressure to find a way to do this.

That's mildly disingenuous given the discussion at the time. But people were told it wasn't going to happen and so I guess everyone learned to live with it. I'm not sure how much needless bandwidth consumption is required before it's considered "pressure", but maybe there's just nobody who really cares about that any more.

But in any case, my point is that we can talk about file dependencies all day long but even if we had none of them at all, it doesn't seem as if that would make the tiniest bit of difference. It seems to me to be a bit out of order to change guidelines and packages unless there's dnf changes which would make them relevant at least exist in some form.

And yeah, I'm sure dnf could get some other optimizations. If they're easy and haven't been done then I guess that's simply a reflection of how little concern there is for those with limited bandwidth.

From what I know, implementing downloading filelists.xml on-demand is not easy + might affect depsolving (Provides: /usr/share/foo vs such file in filelists.xml). So DNF team decided to download everything at once. If we decide to prohibit dependencies on files from filelists.xml and only allow file dependencies from primary.xml -- I will switch DNF to not download filelists.xml at all (and download for repoquery and stuff like that).

From what I know, implementing downloading filelists.xml on-demand is not easy + might affect depsolving (Provides: /usr/share/foo vs such file in filelists.xml). So DNF team decided to download everything at once. If we decide to prohibit dependencies on files from filelists.xml and only allow file dependencies from primary.xml -- I will switch DNF to not download filelists.xml at all (and download for repoquery and stuff like that).

You can't do that anyway, because that would break third parties. The reality is that there's more than Fedora that would rely on DNF, and the on-demand thing does make sense. The main problem is getting DNF to reset itself and start over when it encounters a file dependency that isn't in primary.xml.

It seems to me to be a bit out of order to change guidelines and packages unless there's dnf changes which would make them relevant at least exist in some form.

I think we've got kind of a chicken-and-egg problem. What you say makes sense, but then the DNF team says "well, as long as there are non-primary file deps in the main Fedora repo, changes to DNF wouldn't be relevant either". So, maybe we can decide that we'll update the packaging guidelines and the DNF team will work on support for only adding filelists when necessary?

Revised proposal:

RPM and our higher-level packaging tools support arbitrary file-level dependencies. For example: Requires: /usr/share/inkscape/tutorials/tux.png — since that file is provided by Inkscape, the inkscape package will be brought in to fill the requirement.

This is convenient, but has significant metadata overhead. Our tooling has special casing to reduce this for packages in /usr/bin, /usr/sbin, /usr/libexec, and /etc. Packages must not use file dependencies outside of these paths.

Even within the allowed paths, packages should simply use package-level dependencies (Requires: inkscape) wherever possible; in cases where it is not possible, please make a note as to the problem, which may be solved by packaging changes in the required package (contact the maintainer of that package to see what can be worked out) or in the future with rich Boolean dependencies.

Note that if file-based dependencies are unavoidable, /usr/bin and /usr/sbin must be used instead of /bin and /sbin, due to UsrMove. In some cases, this will require coordinated changes with packages which have not yet been updated to use the /usr paths. Often, using a package-name dependency instead will simply make this go away; in other cases, it is acceptable to use the /bin/ or /sbin paths temporarily until the coordinated change is complete. (There must be a link to a related bugzilla entry in a comment in the spec file.)

We should also add /usr/libexec to the special-casing because of Platform Python...

We should also add /usr/libexec to the special-casing because of Platform Python...

Is it already in primary.xml? I'll edit that into the proposal above.

I don't remember if it is. Someone should check to see if createrepo_c does do that already... If not, it should be added.

You can't just add directories without changing the type of repodata you are serving. Say /usr/libexec/foo doesn't exist in primary ... how do you know if it doesn't actually exist or it was created with a createrepo before the change where libexcec was added to primary?

This is the yum code, which should be easy enough to follow:

https://github.com/rpm-software-management/yum/blob/master/yum/misc.py#L133

...IMNSHO if you are going to change the repo format, don't just do a tiny change like this.

We discussed this at this weeks meeting (http://meetbot.fedoraproject.org/fedora-meeting-1/2017-09-14/fpc.2017-09-14-16.00.txt):

  • x714 let's kill file deps! (geppetto, 16:47:32)
  • There is little chance that we as FPC can just ban file deps.
    outright, so that leaves fixing the major problems with them.
    (geppetto, 17:01:11)
  • DNF currently doesn't implement the yum optimization, so our current
    wording of asking people to avoid non-main file deps. does little
    and increasing the severity of that wording will do nothing.
    (geppetto, 17:01:56)
  • ACTION: mattdm So speak to the DNF team and see if they can fix the
    optimisation compatibility, or maybe FESCO to have a flamewar about
    killing filedeps entirely. (geppetto, 17:03:09)

Okay, so since the current design doesn't include libexec, I'll leave that out of the proposal. It is my understanding that the DNF team is willing to work on the optimization if it's matched by the guidelines.

I'm willing to bring it to FESCo if that's helpful.

It is my understanding that the DNF team is willing to work on the optimization if it's matched by the guidelines.

This is the part I don't understand, though. Fedora can ban file deps all it wants but why does that matter for whether the dnf team will implement the optimization? They will always have to handle looking up file deps in the full metadata because there's no guarantee that such a package will never appear in anything that uses dnf. There are other repositories besides ours and it's not really an option to say "that perfectly valid RPM dependency no longer works at all". We don't really want to prevent some random site from putting an interpreter into /opt/bin/whatever and then putting some script using that interpreter into a package, do we?

So I simply can't understand why dnf activity would gate on changing a single "SHOULD" to "MUST" in the packaging guidelines. Either the optimization can be implemented without loss of functionality or it can't. If it can then, well, we want it as badly as we want the common case to not have to download that extra 50MB of metadata just to update.

The issue is "SHOULD" versus "MUST", I guess. As far as I can tell that's the essence of this proposal. Once it's "MUST" then maybe DNF could then simply refuse to consider any file dependency outside of the limited set of directories and thus not bother ever downloading the additional metadata for regular dependency resolution.

I just have my doubts that is even remotely reasonable.

I'd rather just have the DNF folks implement the optimization technique, and then look at extending the format (because apparently we need to do that to add a directory to the primary.xml data?) so that /usr/libexec content can be used as primary.xml file Provides/Requires.

Starting by working forward from facts, which Matt noted rather than positions or opinions:

> In rawhide today, there are 490 different file reqs. Of these, 79 are in /bin or /sbin, 
> 323 are in  /usr/bin or /usr/sbin, and 25 are in /etc. The remaining 64 are: 

[RPH:  I have moved and trimmed them,  and reclassified, by inspection so that groupings appear]

> /usr/lib/udev/rules.d
> /usr/libexec/platform-python
> /usr/libexec/platform-python3.6
> /usr/libexec/system-python
> /usr/share/aclocal

> /usr/share/X11/rgb.txt

> /usr/share/dict/words

> /usr/share/fonts/bitstream-vera/Vera.ttf
> /usr/share/fonts/bitstream-vera/VeraBI.ttf
> /usr/share/fonts/bitstream-vera/VeraBd.ttf
> /usr/share/fonts/bitstream-vera/VeraIt.ttf
> /usr/share/fonts/bitstream-vera/VeraMoBI.ttf
> /usr/share/fonts/bitstream-vera/VeraMoBd.ttf
> /usr/share/fonts/bitstream-vera/VeraMoIt.ttf
> /usr/share/fonts/bitstream-vera/VeraMono.ttf
> /usr/share/fonts/bitstream-vera/VeraSe.ttf
> /usr/share/fonts/bitstream-vera/VeraSeBd.ttf
> /usr/share/fonts/dejavu
> /usr/share/fonts/dejavu/DejaVuSans-Bold.ttf
> /usr/share/fonts/dejavu/DejaVuSans-BoldOblique.ttf
> /usr/share/fonts/dejavu/DejaVuSans-Oblique.ttf
> /usr/share/fonts/dejavu/DejaVuSans.ttf
> /usr/share/fonts/google-droid/DroidSans.ttf
> /usr/share/fonts/google-droid/DroidSansFallback.ttf
> /usr/share/fonts/liberation/LiberationMono-Bold.ttf
> /usr/share/fonts/liberation/LiberationMono-BoldItalic.ttf
> /usr/share/fonts/liberation/LiberationMono-Italic.ttf
> /usr/share/fonts/liberation/LiberationMono-Regular.ttf
> /usr/share/fonts/liberation/LiberationSans-Bold.ttf
> /usr/share/fonts/liberation/LiberationSans-BoldItalic.ttf
> /usr/share/fonts/liberation/LiberationSans-Italic.ttf
> /usr/share/fonts/liberation/LiberationSans-Regular.ttf
> /usr/share/fonts/liberation/LiberationSerif-Bold.ttf
> /usr/share/fonts/liberation/LiberationSerif-BoldItalic.ttf
> /usr/share/fonts/liberation/LiberationSerif-Italic.ttf
> /usr/share/fonts/liberation/LiberationSerif-Regular.ttf

> /usr/share/sqlninja/backscan.pl
> /usr/share/sqlninja/bruteforce.pl
> /usr/share/sqlninja/dirshell.pl
> /usr/share/sqlninja/dns.pl
> /usr/share/sqlninja/escalation.pl
> /usr/share/sqlninja/fingerprint.pl
> /usr/share/sqlninja/getdata.pl
> /usr/share/sqlninja/icmp.pl
> /usr/share/sqlninja/metasploit.pl
> /usr/share/sqlninja/resurrectxp.pl
> /usr/share/sqlninja/revshell.pl
> /usr/share/sqlninja/session.pl
> /usr/share/sqlninja/sqlcmd.pl
> /usr/share/sqlninja/test.pl
> /usr/share/sqlninja/upload.pl
> /usr/share/sqlninja/utils.pl

> These are requested by 31 different packages. I am super-skeptical that these usages 
> are actually taking advantage of any possible flexibility the feature grants.

Actually it looks like:

  • a few directory dependencies,

  • a pile of font dependencies probably from a mis-packaging, and

  • a couple of remaining packages that are probably trivially refactored to get rid of them

Alternative :

  1. Why not attack the '31 different packages' and get them cleaned up, and run some performance stats.

  2. As the root issue seems to be performance of rebuilding repodate, why not attack that problem, rather than a symptom: to refactoring createrepo run times, and the unpacking / re-building issues, this really calls for cacheing the 99.99 pct unchanged data, and simply invalidating and rebuilding a small subset of what has changed for most of the time, along with a n option 'wipe the world,' and rebuild (which seems to be the present incredibly naiive approach) As I recall Seth's approach to repodata 'back in the day', it was intentionally a cheap Proof of Concept, rather than a thoughtful design -- the repo data format was considered, but the implementation detail and 'easy wins' in cacheing have not been thoughtfully approached, so far as I can tell

TL;DR proposal:

I would 'table' this proposal and put it on hold for N months,

  • get the package cleanups done, and

  • get a formal study of the implementation of cache invalidation done, and THEN see if this seems well advised.

I suspect the urgency for this proposal will fade away, and there will be bigger fish to fry; also, and out of scope to 'traditional' Fedora per se, cross-fertilization, and 'skinnying down' installations (container based comes to mind)) by 'breaking huge dependency chains, is possible with 'File dependencies' -- and almost impossible at the packaging dependency level

(I'd love to see examples from /bin and friends, too.)

/usr/sbin/sendmail is the one example I know where this is required, primarily due to history and assumptions:

dnf repoquery --whatprovides /usr/sbin/sendmail

Last metadata expiration check: 0:01:38 ago on Sun 04 Mar 2018 09:42:48 GMT.
esmtp-0:1.2-10.fc28.armv7hl
exim-0:4.90.1-3.fc28.armv7hl
opensmtpd-0:6.0.3p1-2.fc28.armv7hl
postfix-2:3.2.5-4.fc28.armv7hl
sendmail-0:8.15.2-23.fc28.armv7hl
ssmtp-0:2.64-20.fc28.armv7hl

So rawhide has dnf3 and it's all going to be yum again… maybe. Are there any changes there or on the horizon which might have some bearing on this ticket?

Otherwise, I can't see that there's any consensus for either getting rid of file deps (either some classes of them or all of them) or embracing them more strongly. I guess we could have another discussion if there's time in a meeting.

@tibbs I think we should ban usage of any file dependencies on non-default locations. Everything else is fine.

In future, DNF could implement lazy loading of filelists.

A survey ten months ago turned up a handful of packages still using /bin/ and /sbin/

If it was important bugs to fix this would have been filed, and the offending obsolete matter updated

Was it?

If not, there is an obvious win if one cared about speeding things up

I just don't see that this particular proposal is 'ripe' and would reject it pending getting cleanups done, so one could pull off meaningful stats

I don't understand why we rather don't implement incremental downloads of
(at least this kind of filelist) metadata. Are there some links? This proposal seems
to be a workaround.

I don't understand why we rather don't implement incremental downloads of
(at least this kind of filelist) metadata. Are there some links?

Here you go: https://fedoraproject.org/wiki/Changes/Zchunk_Metadata

That isn't an alternative, if you open a docker container and run a dnf command Zchunk doesn't help you.

Also current 28 updates data is:

7.4K prestodelta.xml.gz
438K comps-Everything.x86_64.xml.gz
15M filelists.xml.gz
4.3M primary.xml.gz
1.5M updateinfo.xml.xz

So the unneeded filelists is almost 3x the size of everything else combined.

Similarly to mock, any other container scenario might as well do dnf metadata caching
on host. No matter wether it's 5 or 20M.

I'd really dislike if container scenarios - at least at this point in time - led to that
important decision for whole Fedora.

I seem to remember yum use to only retrieve the filelists.xml.gz as necessary rather than every time

@ignatenkobrain can help me with the details here.... see his comment from four days ago. It's my understanding that DNF could do "lazy" loading of the filelist too, but is effectively impeded from doing so by the fact that packages in the set actually use them.

Yum's "as necessary" meant "as the kind of strange heuristic got to the point where it thought it might help". As I understand it, the SAT solver approach used by DNF isn't as amenable to this — since some packages in the set currently do use file deps, the file deps list is always needed.

But as noted in the FPC ticket, it's really very few packages. So, I think we could get rid of those by policy, and then DNF could be adjusted to never consider file deps outside of the primary list unless a) given an option to do so or b) explicitly given an install command starting with /. (@ignatenkobrain, is that right?)

It's my understanding that DNF could do "lazy" loading of the filelist too, but is effectively impeded from doing so by the fact that packages in the set actually use them.

Correct. The problem is ignatenkobrain is missing the fact that his approach is ill-formed. file-deps are a mightly tooling with a much richer strength. That said, if you guy want to minimize deps, give up the separation of "file-deps" vs. "other deps" and minimize the resulting deps.

Well, they might be theoretically a mighty tooling, but in practice, we're not using them in a mighty way (see investigation above). So we're paying a huge practical price for something we're not actually benefiting from.

So… libsolv has loadcallback and if you set it - it knows that "there is more data available". So when resolving happens and libsolv can't find some dependency - it will start loading filelists.

Just hook it up to libdnf/hawkey/dnf codebase and it's there. But I haven't done so while being a DNF memeber because the architecture of DNF doesn't "support" this. So if we really want that in dnf, big changes need to happen in libdnf.

Well, they might be theoretically a mighty tooling, but in practice, we're not using them in a mighty way (see investigation above). So we're paying a huge practical price for something we're not actually benefiting from.

You're not, but I am (as a third party packager). File deps don't benefit you as much as they benefit the ecosystem. So if you screw us over on file deps, that's going to make the ecosystem rather upset. Most of us aren't going to just target one distro family (or even a single distro release within a family). We want to support a broad range across a large set of things.

And Igor is wrong about the SAT solver blocking that optimization technique. It's actually possible to implement it, especially since the latest libdnf now lets you set librepo to fetch it or not (courtesy of @walters). The basic strategy would be: attempt to solve without the filelists, and if it leads to an unresolvable due to file path, then download and retry. If it is still unresolvable, then die.

BTW is this ticket just about Requires or is it about BuildRequires as well?

@mattdm that's what yum use to do and the general packaging policy was to avoid the usage wehre possible, I do think they serve a usecase so removing them entirely isn't useful but we use to actively kill them where possible to ensure yum never had to pull the data down in the standard usecase of updating/installing. But every time it's come up in the past with dnf the dnf team said it wasn't possible to do it like that. So if that problem has now been resolved I'm all for minimising the need for it in standard usage. It's possible since dnf because standard the packaging has become sloppy but that should be easy enough to fix but lets get the lazy retrieval in dnf in place and then go through and fix the broken packages.

@ngompa Are you using file deps outside of the "primary" whitelist? Are virtual provides really not an option? I'd love to hear more about the use case.

BTW is this ticket just about Requires or is it about BuildRequires as well?

I was just thinking about runtime, but I suppose there's a build-time advantage as well (speeding up mass rebuilds)?

Well from dnf I doubt there's any code difference in terms of resolving in terms of filedeps, it's the same thing as far as it's concerned IE: translate a path into an actual RPM.

also @ngompa has a point, especially things like Chrome/Skype and other such third party packages, they often package a rpm with file dependencies so they don't need to care about package naming per distro (or if the files are moved between releases or even NVRs).

I don't see how we can just kill them off in a short term, I think we need the "don't download the filelists.xml.gz by default and fallback as necessary approach" and this would also be a problem in the EL ecosystem (not that we focus on that directly).

@ngompa Are you using file deps outside of the "primary" whitelist? Are virtual provides really not an option? I'd love to hear more about the use case.

Yes. I regularly pull in things from /usr/libexec and /usr/lib64 (and often subdirectories from there), for example. Virtual Provides only work if everyone (i.e. Red Hat, Fedora, and [open]SUSE) actually agree to have common virtual provides. Things like plugins and helper binaries can and are often packaged differently. And they have to exist first. I can't even rely on this for Python modules yet because I'm still trying to convince openSUSE to finally turn on the pythondistdeps generator that I contributed from Mageia/Mandriva to RPM.

In your happy little world where you can more or less control everything (within the distribution), every problem is relatively easy to tackle. The broader ecosystem doesn't get this benefit, and as both a first party and a third party packager, I feel both sides of this on a regular basis.

It's already difficult to convince people to support Fedora, don't make life harder by making it so spec files have to be ugly as sin across every single Fedora release as people rename, change, split, recombine, etc. packages.

also @ngompa has a point, especially things like Chrome/Skype and other such third party packages, they often package a rpm with file dependencies so they don't need to care about package naming per distro (or if the files are moved between releases or even NVRs).

$ rpm -qR google-chrome-stable |grep /|sort -u
/bin/sh
/usr/bin/lsb_release
/usr/sbin/update-alternatives

and the skypeforlinux RPM only uses /bin/sh. So those'd be covered by the primary whitelist.

@ngompa We have /usr/libexec in primary, so it'd really just be /usr/lib64 that's the concern.

If we can make DNF only pull down the extended filelist when actually needed, I can live with that, although I really do remain a bit dubious without some concrete examples. Isn't some random /usr/lib64/whatever/foo/ more likely to be inconsistent between distros and releases?

@ngompa We have /usr/libexec in primary, so it'd really just be /usr/lib64 that's the concern.
If we can make DNF only pull down the extended filelist when actually needed, I can live with that, although I really do remain a bit dubious without some concrete examples. Isn't some random /usr/lib64/whatever/foo/ more likely to be inconsistent between distros and releases?

No. This is actually a common thing for library plugins. For example, if I needed VLC with the fluidsynth plugin, I would do this:

Requires: vlc
Requires: /usr/lib64/vlc/plugins/codec/libfluidsynth_plugin.so

That allows me to be distro-agnostic while enumerating my exact requirements.

@ngompa We have /usr/libexec in primary.

Per @james, we actually don't.

@ngompa We have /usr/libexec in primary.

Per @james, we actually don't.

I generally believe James, but per "there are /usr/libexec <file> entries in an actual primary.xml file I'm looking at right now", I'm not so sure in this case. :)

Anyway, I think that if we can get DNF to do only-as-needed pulls of the filelists file, I think we still should strengthen the guidelines to forbid non-primary file deps within the Fedora package set, or some subset thereof. Right now it's only "SHOULD", with a lot of leeway. Then, your third-party packages will still work, and we won't need the filelists file within just Fedora packages at least.

Mattdm's bare URL confused me

This was cross-filed at:

https://pagure.io/fesco/issue/1955

I'd like to note that I probably use file deps a lot in my QA work. I very often need to know which (not-installed) packages contain a certain file, so I regularly run commands like dnf provides /usr/bin/7z or dnf provides **/fedora.py (not real world examples). It's e.g. useful when an interpreted program crashes due to a missing file, and you need to file a library containing that file. And similar. I assume this is using the same file metadata that you're trying to eliminate here.

I don't really care whether file deps are banned in spec files, or whether dnf is able to download file metadata only when needed (of course, that would be nice), but I'd very much like the dnf provides command to keep working in the future. So perhaps reduce the file deps usage so that the metadata are not usually needed, but please don't kill the functionality completely.

Which files are in a package and files-as-dependencies are a separate issue. One can easily imagine a DNF plugin which handles listing files by package or packages by file, and which can install packages which contain certain files, all without injecting that as provides.

But if we can do the split thing, and then reduce packages which are using file deps erroneously or unnecessarily, that gets me to what I'm really looking for. I do think that a policy of avoiding these within the Fedora package set is desirable, because then the filelist will never need to be downloaded in normal operation, and as I user I'm not penalized for unintentionally installing a package which has decided to pull in that metadata.

Just wanted to reiterate that we already have a policy of avoiding these within the Fedora package set. https://fedoraproject.org/wiki/Packaging:Guidelines#File_and_Directory_Dependencies

That policy has been there for years. And FPC has certainly offered (at some point in this protracted discussion) to tighten up the language if that's what's desired. It comes down to this:

  • We already have a guideline saying that you shouldn't use file dependencies.
  • An outright ban on file dependencies ignores the fact that they are extremely useful in some specific situations.
  • FPC was willing to consider an outright ban on at least some classes of file dependencies (those outside of a fixed set of directories) but this would give close to zero upside against a nonzero downside because DNF still downloads all of the metadata anyway.
  • FPC has no power to tell the DNF developers to devote time to rectifying that. We did ask and were told it was either difficult or impossible and wasn't in the cards.

And so now the cold potato has been reheated and tossed over to FESCo. But the underlying situation is the same, except that FESCo has perhaps more means to steer DNF development. A complete ban on file dependencies would be a bad thing. A ban of those outside a fixed set of directories is still reasonable but as of today it still doesn't make any actual difference to the amount of downloaded metadata.

So I think the bottom line is the same and the short term outcome is still the same unless FESCo wants to override FPC's reasoning that measurable downside outweighs negligible upside and ban file dependencies anyway.

Another thing that should be addressed is the idea that these should be banned now in order to clear the way for DNF to implement optimizations. But that's misguided as well, because we don't actually know what optimizations are possible given that we have already established that general file dependencies are necessary if only for third party repositories. So we need to know what is possible, and then tailor the guidelines (and any possible options for configuration of DNF and the metadata creation process) for the best balance between the needs of packagers and benefit to the end users.

That very well may end up coming down to banning the use of file deps outside of a specific set of directories. But we don't even know at this point the optimal way to construct that set. Surely we can guess, but banning anything before we even have a trial DNF and metadata creation implementation seems to be doing things out of order.

What I'd personally like to see happen:

  • Pointless file dependencies get cleaned up. They are already violating packaging policy, so fixing them doesn't require a decision from any committee. I'm certainly happy to spend a couple of hours doing cleanup.
  • FESCo asks DNF developers to invest some effort into the split metadata scheme.
  • Once an implementation exists, we look at tuning the set of permissible directories with an eye towards minimizing the amount of downloaded metadata.
  • Only then do we adjust the packaging guidelines and force-fix the remainder of the packages (if there are any).

We might however discuss what file dependencies are good.

For example, I consider this good:

BuildRequires:  /usr/bin/tox

Or this:

BuildRequires:  /usr/bin/iconv

And I consider this bad:

Requires:  /usr/share/inkscape/tutorials/tux.png

I feel like there is some kind of consensus over this:

  • we would like to have certain kind of file-deps considered OK - dnf can (most likely) be optimized for those
  • we would like to ban all the other kind of file deps in the guidelines because they are weird, but we need to keep them working for 3rd party repos, backwards compatibility and users, it's OK if dnf is not optimized for those

We are in a circle here. So let's just see what we consider OK or what we consider not OK and start from there? Whether we start in the guidelines or in dnf/createrepo is not important. Or am I missing something here?

I'm something of an idiot and posted my huge screed in this ticket when I thought it was the FESCo ticket. Oh, well.

FPC has agreed (+5, 0, -0) to change the first paragraph of https://fedoraproject.org/wiki/Packaging:Guidelines#File_and_Directory_Dependencies to the following:

RPM gives you the ability to depend on arbitrary files or directories instead of packages.
Packages SHOULD NOT include file dependencies outside of the following directories:

  • /usr/bin
  • /usr/sbin
  • /etc

Announcement text for that change:

The section on file and directory dependencies has been rewritten to be clearer and to explicitly indicate which file dependencies are permissible and which are not.

Metadata Update from @tibbs:
- Issue tagged with: announce

11 months ago

Thanks. I'm happy with this at this point. I'll follow up with the DNF team, and with individual package bugs.

Metadata Update from @tibbs:
- Issue untagged with: announce, meeting
- Issue close_status updated to: accepted
- Issue status updated to: Closed (was: Open)

10 months ago

Login to comment on this ticket.

Metadata