Hi!
I've recently been bitten by the old "huge filelists.xml.gz repodata download and the resulting slowness" issue when I was implementing a CI system (using GitHub Actions) where each step in the pipeline specifies a Dockerfile that is used to dynamically create a container in which the step's script is run.
Dockerfile
I've read the discussion in FPC 714 and FESCo #1955, but the sad situation is that more than a year after the FESCo decision was made:
This was discussed in yesterday's FESCo meeting (2018-08-20): AGREED: ask dnf folks to put lazy loading (or reduced repodata loads) on their roadmap, close ticket (+1: 5, -1: 0, +0: 0) The FPC ticket https://pagure.io/packaging-committee/issue/714 was in accepted a few days ago. Nirik filed https://bugzilla.redhat.com/show_bug.cgi?id=1619368 for the dnf RFE for lazy loading.
This was discussed in yesterday's FESCo meeting (2018-08-20): AGREED: ask dnf folks to put lazy loading (or reduced repodata loads) on their roadmap, close ticket (+1: 5, -1: 0, +0: 0)
The FPC ticket https://pagure.io/packaging-committee/issue/714 was in accepted a few days ago. Nirik filed https://bugzilla.redhat.com/show_bug.cgi?id=1619368 for the dnf RFE for lazy loading.
nothing (except for a single Bugzilla comment) appears to be happening on the DNF side.
My question is, would FESCo be willing to reach out to the DNF team again on behalf of Fedora users and ask them if they could put lazy loading (or any other solution that reduces the time needed by the initial dnf command to run) on their roadmap for Fedora 32?
/cc @jmracek @dmach
This is a tricky request. How dnf could know that filelist is not required? Filelists are not only used for dependency resolution but also by users that want to install or discover a package that provide a certain path.
Some examples: dnf install foo # foo package found, but it required foo1, and it requires /etc/dnf/dnf.conf. But still some package could provide it. But in this phase it really fails. DNF tries to download all filelists for all repositories and put then into sack (data for solver), but one filelist cannot be downloaded, because new metadata available. Now it gets really messy, because dnf would drop the sack, re-download the problematic repository, load sack again, resolve all arguments again.
Another example: dnf makecache # no file list downloaded dnf repoquery -C /etc/dnf/dnf.conf - With cache only - nothing found, fail, or what dnf should provide?
The most of commands will have similar problems: 1. We cannot predict whether filelist will be required in advance 2. We don't know whether downloading of filelist will help to resolve the request 3. Fail of downloading of filelist could result in re-downloading complete metadata from repository, recreating of sack - a lot of CPU is used, query all arguments again (start from the beginning).
What it brings: 1. less download 2. lower disc requirements 3. faster resolution and dnf loading
The cost: 1. more downloads - zchunk 2. slower resolution and dnf loading (load 2x) 3. The spare that was spared for filelist cannot be used anyway, because it could be required anyway. 4. Working in cache only mode gets really ugly or unreliable.
Short version:
if the feature should make sense, all Fedora RPMs must not contain any file dependencies outside files provided by primary.xml (AFAIK there's no packaging policy for that)
It is, but people are not following it: https://docs.fedoraproject.org/en-US/packaging-guidelines/#_file_and_directory_dependencies
Yes, I understand that implementing lazy loading for all the possible use cases would be hard and tricky.
The thing that motivates me is that currently, Fedora is practically unusable as an ephemeral container/VM for quickly running CI/CD jobs where one needs to install some additional packages.
Perhaps this is best illustrated with a simple example. Let's say I have a CI script that needs 3 Python packages, namely bump2version, PyGithub and PyYAML.
If I use pip3 to install them from PyPI, the whole process takes ~20s:
$ time docker run --rm -it fedora:30 /bin/bash -c "pip3 install bump2version==0.5.8 pygithub==1.43.8 pyyaml==5.1" WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead. Collecting bump2version==0.5.8 Downloading https://files.pythonhosted.org/packages/16/a5/5d8e4fc4e2217cb422d4ad63c92921bc8679fae01b5c4a09d51dd5841f13/bump2version-0.5.8-py2.py3-none-any.whl Collecting pygithub==1.43.8 Downloading https://files.pythonhosted.org/packages/13/66/5c510242526d162708ffc0c82fd5ba647886f07d7abcae8587adeec86411/PyGithub-1.43.8.tar.gz (108kB) 100% |████████████████████████████████| 112kB 4.7MB/s Collecting pyyaml==5.1 Downloading https://files.pythonhosted.org/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz (274kB) 100% |████████████████████████████████| 276kB 9.5MB/s Collecting deprecated (from pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/88/0e/9d5a1a8cd7130c49334cce7b8167ceda63d6a329c8ea65b626116bc9e9e6/Deprecated-1.2.6-py2.py3-none-any.whl Collecting pyjwt (from pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl Collecting requests>=2.14.0 (from pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB) 100% |████████████████████████████████| 61kB 19.2MB/s Collecting wrapt<2,>=1.10 (from deprecated->pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/23/84/323c2415280bc4fc880ac5050dddfb3c8062c2552b34c2e512eb4aa68f79/wrapt-1.11.2.tar.gz Collecting certifi>=2017.4.17 (from requests>=2.14.0->pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/18/b0/8146a4f8dd402f60744fa380bc73ca47303cccf8b9190fd16a827281eac2/certifi-2019.9.11-py2.py3-none-any.whl (154kB) 100% |████████████████████████████████| 163kB 5.5MB/s Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.14.0->pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB) 100% |████████████████████████████████| 143kB 1.4MB/s Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests>=2.14.0->pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/e6/60/247f23a7121ae632d62811ba7f273d0e58972d75e58a94d329d51550a47d/urllib3-1.25.3-py2.py3-none-any.whl (150kB) 100% |████████████████████████████████| 153kB 4.9MB/s Collecting idna<2.9,>=2.5 (from requests>=2.14.0->pygithub==1.43.8) Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB) 100% |████████████████████████████████| 61kB 9.5MB/s Installing collected packages: bump2version, wrapt, deprecated, pyjwt, certifi, chardet, urllib3, idna, requests, pygithub, pyyaml Running setup.py install for wrapt ... done Running setup.py install for pygithub ... done Running setup.py install for pyyaml ... done Successfully installed bump2version-0.5.8 certifi-2019.9.11 chardet-3.0.4 deprecated-1.2.6 idna-2.8 pygithub-1.43.8 pyjwt-1.7.1 pyyaml-5.1 requests-2.22.0 urllib3-1.25.3 wrapt-1.11.2 real 0m19.464s user 0m0.103s sys 0m0.030s
If I use dnf to install them, the whole process takes ~3.5 minutes, the biggest time sink being downloading of repodata that is (very probably) never used:
$ time docker run --rm -it fedora:30 /bin/bash -c "dnf -y install bumpversion python3-pygithub python3-pyyaml" Fedora Modular 30 - x86_64 777 kB/s | 1.9 MB 00:02 Fedora Modular 30 - x86_64 - Updates 458 kB/s | 2.9 MB 00:06 Fedora 30 - x86_64 - Updates 683 kB/s | 24 MB 00:35 Fedora 30 - x86_64 504 kB/s | 61 MB 02:03 Last metadata expiration check: 0:00:02 ago on Wed Sep 18 20:48:53 2019. Dependencies resolved. ============================================================================================================================================================================================== Package Architecture Version Repository Size ============================================================================================================================================================================================== Installing: python3-pygithub noarch 1.43.8-1.fc30 updates 229 k bumpversion noarch 0.5.8-4.fc30 fedora 41 k python3-pyyaml x86_64 5.1-1.fc30 fedora 201 k Installing dependencies: python3-deprecated noarch 1.2.6-3.fc30 updates 19 k python3-urllib3 noarch 1.24.3-2.fc30 updates 170 k python3-asn1crypto noarch 0.24.0-6.fc30 fedora 180 k python3-cffi x86_64 1.11.5-7.fc30 fedora 229 k python3-chardet noarch 3.0.4-9.fc30 fedora 191 k python3-cryptography x86_64 2.6.1-1.fc30 fedora 518 k python3-idna noarch 2.7-4.fc30 fedora 83 k python3-jwt noarch 1.7.1-2.fc30 fedora 42 k python3-ply noarch 3.11-2.fc30 fedora 107 k python3-pycparser noarch 2.14-18.fc30 fedora 147 k python3-pysocks noarch 1.6.8-7.fc30 fedora 33 k python3-requests noarch 2.21.0-2.fc30 fedora 115 k python3-six noarch 1.12.0-1.fc30 fedora 35 k python3-wrapt x86_64 1.11.1-1.fc30 fedora 50 k Transaction Summary ============================================================================================================================================================================================== Install 17 Packages Total download size: 2.3 M Installed size: 11 M Downloading Packages: (1/17): python3-deprecated-1.2.6-3.fc30.noarch.rpm 51 kB/s | 19 kB 00:00 (2/17): python3-pygithub-1.43.8-1.fc30.noarch.rpm 331 kB/s | 229 kB 00:00 (3/17): python3-urllib3-1.24.3-2.fc30.noarch.rpm 191 kB/s | 170 kB 00:00 (4/17): bumpversion-0.5.8-4.fc30.noarch.rpm 46 kB/s | 41 kB 00:00 (5/17): python3-asn1crypto-0.24.0-6.fc30.noarch.rpm 106 kB/s | 180 kB 00:01 (6/17): python3-chardet-3.0.4-9.fc30.noarch.rpm 88 kB/s | 191 kB 00:02 (7/17): python3-cffi-1.11.5-7.fc30.x86_64.rpm 83 kB/s | 229 kB 00:02 (8/17): python3-jwt-1.7.1-2.fc30.noarch.rpm 26 kB/s | 42 kB 00:01 (9/17): python3-idna-2.7-4.fc30.noarch.rpm 45 kB/s | 83 kB 00:01 (10/17): python3-cryptography-2.6.1-1.fc30.x86_64.rpm 132 kB/s | 518 kB 00:03 (11/17): python3-pycparser-2.14-18.fc30.noarch.rpm 96 kB/s | 147 kB 00:01 (12/17): python3-pysocks-1.6.8-7.fc30.noarch.rpm 42 kB/s | 33 kB 00:00 (13/17): python3-ply-3.11-2.fc30.noarch.rpm 56 kB/s | 107 kB 00:01 (14/17): python3-pyyaml-5.1-1.fc30.x86_64.rpm 219 kB/s | 201 kB 00:00 (15/17): python3-six-1.12.0-1.fc30.noarch.rpm 56 kB/s | 35 kB 00:00 (16/17): python3-requests-2.21.0-2.fc30.noarch.rpm 153 kB/s | 115 kB 00:00 (17/17): python3-wrapt-1.11.1-1.fc30.x86_64.rpm 86 kB/s | 50 kB 00:00 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 253 kB/s | 2.3 MB 00:09 Running transaction check Transaction check succeeded. ... output trimmed ... Installed: python3-pygithub-1.43.8-1.fc30.noarch bumpversion-0.5.8-4.fc30.noarch python3-pyyaml-5.1-1.fc30.x86_64 python3-deprecated-1.2.6-3.fc30.noarch python3-urllib3-1.24.3-2.fc30.noarch python3-asn1crypto-0.24.0-6.fc30.noarch python3-cffi-1.11.5-7.fc30.x86_64 python3-chardet-3.0.4-9.fc30.noarch python3-cryptography-2.6.1-1.fc30.x86_64 python3-idna-2.7-4.fc30.noarch python3-jwt-1.7.1-2.fc30.noarch python3-ply-3.11-2.fc30.noarch python3-pycparser-2.14-18.fc30.noarch python3-pysocks-1.6.8-7.fc30.noarch python3-requests-2.21.0-2.fc30.noarch python3-six-1.12.0-1.fc30.noarch python3-wrapt-1.11.1-1.fc30.x86_64 Complete! real 3m34.990s user 0m0.165s sys 0m0.112s
That is just too slow to be usable (in such a CI scenario).
we're not planning this feature for Fedora 32 and 33
Although this is quite disappointing, thanks for sharing this information.
I originally filed this issue as a revival of the FESCo 1955 issue since it was discussed there at length and a decision to ask the DNF team to put lazy loading on their roadmap was made at the Aug 20, 2018 FESCo meeting.
However, perhaps I could simplify the request for lazy loading of filelists.yml to a no loading opt-in which would let dnf fail if the dependencies couldn't be satisfied by primary.xml. Would that be easier to implement?
filelists.yml
dnf
primary.xml
Well, libsolv will tell you. It will call function you've provided for downloading/loading filelists.
Some examples: dnf install foo foo package found, but it required foo1, and it requires /etc/dnf/dnf.conf. But still some package could provide it. But in this phase it really fails. DNF tries to download all filelists for all repositories and put then into sack (data for solver), but one filelist cannot be downloaded, because new metadata available. Now it gets really messy, because dnf would drop the sack, re-download the problematic repository, load sack again, resolve all arguments again.
No, it will retry downloads according standard configuration (as it is now when downloading fails).
It should say an error that it would download filelists.xml, but --cacheonly is set.
--cacheonly
The most of commands will have similar problems: 1. We cannot predict whether filelist will be required in advance
Do you really have to?
We don't know whether downloading of filelist will help to resolve the request
This is fine.
Fail of downloading of filelist could result in re-downloading complete metadata from repository, recreating of sack - a lot of CPU is used, query all arguments again (start from the beginning).
Why would it? You need to download only filelists.xml, nothing more. You don't need to re-create pool, you just need to load additional metadata into it.
By the way, FUS (the solver used in creation of hybrid repos RHEL is using) has this functionality for more than 1 year: https://github.com/fedora-modularity/fus/commit/9f9e712bab91af97f9503950363484fe903726ae
Since this is a request for FESCo to reach out to dnf team and we just did, I have no idea whether this requires more action from FESCo. I'd like to have this too, but it's not like FESCo can use some kind of superpowers to make it happen.
I understand. Thanks for reaching out to the dnf team on FESCo's behalf.
The question is, where to continue this discussion and hopefully persuade the dnf team to improve the sad situation I described in my previous comment? Perhaps in dnf's lazy loading RFE?
@ignatenkobrain, thanks for your remarks. I think it would be useful to copy them to the dnf's lazy loading RFE or some place where we want to continue this discussion.
Metadata Update from @sgallagh: - Issue close_status updated to: Accepted - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.