#2229 Could FESCo ask the DNF team to make lazy loading (or reduced repodata loads) on the roadmap for Fedora 32?
Closed: Accepted 4 years ago by sgallagh. Opened 4 years ago by tadej.

Hi!

I've recently been bitten by the old "huge filelists.xml.gz repodata download and the resulting slowness" issue when I was implementing a CI system (using GitHub Actions) where each step in the pipeline specifies a Dockerfile that is used to dynamically create a container in which the step's script is run.

I've read the discussion in FPC 714 and FESCo #1955, but the sad situation is that more than a year after the FESCo decision was made:

This was discussed in yesterday's FESCo meeting (2018-08-20):
AGREED: ask dnf folks to put lazy loading (or reduced repodata loads) on their roadmap, close ticket (+1: 5, -1: 0, +0: 0)

The FPC ticket https://pagure.io/packaging-committee/issue/714 was in accepted a few days ago.
Nirik filed https://bugzilla.redhat.com/show_bug.cgi?id=1619368 for the dnf RFE for lazy loading.

nothing (except for a single Bugzilla comment) appears to be happening on the DNF side.

My question is, would FESCo be willing to reach out to the DNF team again on behalf of Fedora users and ask them if they could put lazy loading (or any other solution that reduces the time needed by the initial dnf command to run) on their roadmap for Fedora 32?


This is a tricky request. How dnf could know that filelist is not required? Filelists are not only used for dependency resolution but also by users that want to install or discover a package that provide a certain path.

Some examples:
dnf install foo
# foo package found, but it required foo1, and it requires /etc/dnf/dnf.conf. But still some package could provide it. But in this phase it really fails. DNF tries to download all filelists for all repositories and put then into sack (data for solver), but one filelist cannot be downloaded, because new metadata available. Now it gets really messy, because dnf would drop the sack, re-download the problematic repository, load sack again, resolve all arguments again.

Another example:
dnf makecache # no file list downloaded
dnf repoquery -C /etc/dnf/dnf.conf - With cache only - nothing found, fail, or what dnf should provide?

The most of commands will have similar problems:
1. We cannot predict whether filelist will be required in advance
2. We don't know whether downloading of filelist will help to resolve the request
3. Fail of downloading of filelist could result in re-downloading complete metadata from repository, recreating of sack - a lot of CPU is used, query all arguments again (start from the beginning).

What it brings:
1. less download
2. lower disc requirements
3. faster resolution and dnf loading

The cost:
1. more downloads - zchunk
2. slower resolution and dnf loading (load 2x)
3. The spare that was spared for filelist cannot be used anyway, because it could be required anyway.
4. Working in cache only mode gets really ugly or unreliable.

Short version:

  • we're not planning this feature for Fedora 32 and 33
  • if the feature should make sense, all Fedora RPMs must not contain any file dependencies outside files provided by primary.xml (AFAIK there's no packaging policy for that)

if the feature should make sense, all Fedora RPMs must not contain any file dependencies outside files provided by primary.xml (AFAIK there's no packaging policy for that)

It is, but people are not following it: https://docs.fedoraproject.org/en-US/packaging-guidelines/#_file_and_directory_dependencies

This is a tricky request. How dnf could know that filelist is not required? Filelists are not only used for dependency resolution but also by users that want to install or discover a package that provide a certain path.

Yes, I understand that implementing lazy loading for all the possible use cases would be hard and tricky.

The thing that motivates me is that currently, Fedora is practically unusable as an ephemeral container/VM for quickly running CI/CD jobs where one needs to install some additional packages.

Perhaps this is best illustrated with a simple example. Let's say I have a CI script that needs 3 Python packages, namely bump2version, PyGithub and PyYAML.

If I use pip3 to install them from PyPI, the whole process takes ~20s:

$ time docker run --rm -it fedora:30 /bin/bash -c "pip3 install bump2version==0.5.8 pygithub==1.43.8 pyyaml==5.1"
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting bump2version==0.5.8
  Downloading https://files.pythonhosted.org/packages/16/a5/5d8e4fc4e2217cb422d4ad63c92921bc8679fae01b5c4a09d51dd5841f13/bump2version-0.5.8-py2.py3-none-any.whl
Collecting pygithub==1.43.8
  Downloading https://files.pythonhosted.org/packages/13/66/5c510242526d162708ffc0c82fd5ba647886f07d7abcae8587adeec86411/PyGithub-1.43.8.tar.gz (108kB)
    100% |████████████████████████████████| 112kB 4.7MB/s 
Collecting pyyaml==5.1
  Downloading https://files.pythonhosted.org/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz (274kB)
    100% |████████████████████████████████| 276kB 9.5MB/s 
Collecting deprecated (from pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/88/0e/9d5a1a8cd7130c49334cce7b8167ceda63d6a329c8ea65b626116bc9e9e6/Deprecated-1.2.6-py2.py3-none-any.whl
Collecting pyjwt (from pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl
Collecting requests>=2.14.0 (from pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
    100% |████████████████████████████████| 61kB 19.2MB/s 
Collecting wrapt<2,>=1.10 (from deprecated->pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/23/84/323c2415280bc4fc880ac5050dddfb3c8062c2552b34c2e512eb4aa68f79/wrapt-1.11.2.tar.gz
Collecting certifi>=2017.4.17 (from requests>=2.14.0->pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/18/b0/8146a4f8dd402f60744fa380bc73ca47303cccf8b9190fd16a827281eac2/certifi-2019.9.11-py2.py3-none-any.whl (154kB)
    100% |████████████████████████████████| 163kB 5.5MB/s 
Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.14.0->pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
    100% |████████████████████████████████| 143kB 1.4MB/s 
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests>=2.14.0->pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/e6/60/247f23a7121ae632d62811ba7f273d0e58972d75e58a94d329d51550a47d/urllib3-1.25.3-py2.py3-none-any.whl (150kB)
    100% |████████████████████████████████| 153kB 4.9MB/s 
Collecting idna<2.9,>=2.5 (from requests>=2.14.0->pygithub==1.43.8)
  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)
    100% |████████████████████████████████| 61kB 9.5MB/s 
Installing collected packages: bump2version, wrapt, deprecated, pyjwt, certifi, chardet, urllib3, idna, requests, pygithub, pyyaml
  Running setup.py install for wrapt ... done
  Running setup.py install for pygithub ... done
  Running setup.py install for pyyaml ... done
Successfully installed bump2version-0.5.8 certifi-2019.9.11 chardet-3.0.4 deprecated-1.2.6 idna-2.8 pygithub-1.43.8 pyjwt-1.7.1 pyyaml-5.1 requests-2.22.0 urllib3-1.25.3 wrapt-1.11.2

real    0m19.464s
user    0m0.103s
sys 0m0.030s

If I use dnf to install them, the whole process takes ~3.5 minutes, the biggest time sink being downloading of repodata that is (very probably) never used:

$ time docker run --rm -it fedora:30 /bin/bash -c "dnf -y install bumpversion python3-pygithub python3-pyyaml"
Fedora Modular 30 - x86_64                                                                                                                                    777 kB/s | 1.9 MB     00:02    
Fedora Modular 30 - x86_64 - Updates                                                                                                                          458 kB/s | 2.9 MB     00:06    
Fedora 30 - x86_64 - Updates                                                                                                                                  683 kB/s |  24 MB     00:35    
Fedora 30 - x86_64                                                                                                                                            504 kB/s |  61 MB     02:03    
Last metadata expiration check: 0:00:02 ago on Wed Sep 18 20:48:53 2019.
Dependencies resolved.
==============================================================================================================================================================================================
 Package                                               Architecture                            Version                                         Repository                                Size
==============================================================================================================================================================================================
Installing:
 python3-pygithub                                      noarch                                  1.43.8-1.fc30                                   updates                                  229 k
 bumpversion                                           noarch                                  0.5.8-4.fc30                                    fedora                                    41 k
 python3-pyyaml                                        x86_64                                  5.1-1.fc30                                      fedora                                   201 k
Installing dependencies:
 python3-deprecated                                    noarch                                  1.2.6-3.fc30                                    updates                                   19 k
 python3-urllib3                                       noarch                                  1.24.3-2.fc30                                   updates                                  170 k
 python3-asn1crypto                                    noarch                                  0.24.0-6.fc30                                   fedora                                   180 k
 python3-cffi                                          x86_64                                  1.11.5-7.fc30                                   fedora                                   229 k
 python3-chardet                                       noarch                                  3.0.4-9.fc30                                    fedora                                   191 k
 python3-cryptography                                  x86_64                                  2.6.1-1.fc30                                    fedora                                   518 k
 python3-idna                                          noarch                                  2.7-4.fc30                                      fedora                                    83 k
 python3-jwt                                           noarch                                  1.7.1-2.fc30                                    fedora                                    42 k
 python3-ply                                           noarch                                  3.11-2.fc30                                     fedora                                   107 k
 python3-pycparser                                     noarch                                  2.14-18.fc30                                    fedora                                   147 k
 python3-pysocks                                       noarch                                  1.6.8-7.fc30                                    fedora                                    33 k
 python3-requests                                      noarch                                  2.21.0-2.fc30                                   fedora                                   115 k
 python3-six                                           noarch                                  1.12.0-1.fc30                                   fedora                                    35 k
 python3-wrapt                                         x86_64                                  1.11.1-1.fc30                                   fedora                                    50 k

Transaction Summary
==============================================================================================================================================================================================
Install  17 Packages

Total download size: 2.3 M
Installed size: 11 M
Downloading Packages:
(1/17): python3-deprecated-1.2.6-3.fc30.noarch.rpm                                                                                                             51 kB/s |  19 kB     00:00    
(2/17): python3-pygithub-1.43.8-1.fc30.noarch.rpm                                                                                                             331 kB/s | 229 kB     00:00    
(3/17): python3-urllib3-1.24.3-2.fc30.noarch.rpm                                                                                                              191 kB/s | 170 kB     00:00    
(4/17): bumpversion-0.5.8-4.fc30.noarch.rpm                                                                                                                    46 kB/s |  41 kB     00:00    
(5/17): python3-asn1crypto-0.24.0-6.fc30.noarch.rpm                                                                                                           106 kB/s | 180 kB     00:01    
(6/17): python3-chardet-3.0.4-9.fc30.noarch.rpm                                                                                                                88 kB/s | 191 kB     00:02    
(7/17): python3-cffi-1.11.5-7.fc30.x86_64.rpm                                                                                                                  83 kB/s | 229 kB     00:02    
(8/17): python3-jwt-1.7.1-2.fc30.noarch.rpm                                                                                                                    26 kB/s |  42 kB     00:01    
(9/17): python3-idna-2.7-4.fc30.noarch.rpm                                                                                                                     45 kB/s |  83 kB     00:01    
(10/17): python3-cryptography-2.6.1-1.fc30.x86_64.rpm                                                                                                         132 kB/s | 518 kB     00:03    
(11/17): python3-pycparser-2.14-18.fc30.noarch.rpm                                                                                                             96 kB/s | 147 kB     00:01    
(12/17): python3-pysocks-1.6.8-7.fc30.noarch.rpm                                                                                                               42 kB/s |  33 kB     00:00    
(13/17): python3-ply-3.11-2.fc30.noarch.rpm                                                                                                                    56 kB/s | 107 kB     00:01    
(14/17): python3-pyyaml-5.1-1.fc30.x86_64.rpm                                                                                                                 219 kB/s | 201 kB     00:00    
(15/17): python3-six-1.12.0-1.fc30.noarch.rpm                                                                                                                  56 kB/s |  35 kB     00:00    
(16/17): python3-requests-2.21.0-2.fc30.noarch.rpm                                                                                                            153 kB/s | 115 kB     00:00    
(17/17): python3-wrapt-1.11.1-1.fc30.x86_64.rpm                                                                                                                86 kB/s |  50 kB     00:00    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                         253 kB/s | 2.3 MB     00:09     
Running transaction check
Transaction check succeeded.

... output trimmed ... 

Installed:
  python3-pygithub-1.43.8-1.fc30.noarch           bumpversion-0.5.8-4.fc30.noarch                python3-pyyaml-5.1-1.fc30.x86_64             python3-deprecated-1.2.6-3.fc30.noarch       
  python3-urllib3-1.24.3-2.fc30.noarch            python3-asn1crypto-0.24.0-6.fc30.noarch        python3-cffi-1.11.5-7.fc30.x86_64            python3-chardet-3.0.4-9.fc30.noarch          
  python3-cryptography-2.6.1-1.fc30.x86_64        python3-idna-2.7-4.fc30.noarch                 python3-jwt-1.7.1-2.fc30.noarch              python3-ply-3.11-2.fc30.noarch               
  python3-pycparser-2.14-18.fc30.noarch           python3-pysocks-1.6.8-7.fc30.noarch            python3-requests-2.21.0-2.fc30.noarch        python3-six-1.12.0-1.fc30.noarch             
  python3-wrapt-1.11.1-1.fc30.x86_64             

Complete!

real    3m34.990s
user    0m0.165s
sys 0m0.112s

That is just too slow to be usable (in such a CI scenario).

we're not planning this feature for Fedora 32 and 33

Although this is quite disappointing, thanks for sharing this information.


I originally filed this issue as a revival of the FESCo 1955 issue since it was discussed there at length and a decision to ask the DNF team to put lazy loading on their roadmap was made at the Aug 20, 2018 FESCo meeting.

However, perhaps I could simplify the request for lazy loading of filelists.yml to a no loading opt-in which would let dnf fail if the dependencies couldn't be satisfied by primary.xml. Would that be easier to implement?

This is a tricky request. How dnf could know that filelist is not required? Filelists are not only used for dependency resolution but also by users that want to install or discover a package that provide a certain path.

Well, libsolv will tell you. It will call function you've provided for downloading/loading filelists.

Some examples:
dnf install foo
foo package found, but it required foo1, and it requires /etc/dnf/dnf.conf. But still some package could provide it. But in this phase it really fails. DNF tries to download all filelists for all repositories and put then into sack (data for solver), but one filelist cannot be downloaded, because new metadata available. Now it gets really messy, because dnf would drop the sack, re-download the problematic repository, load sack again, resolve all arguments again.

No, it will retry downloads according standard configuration (as it is now when downloading fails).

Another example:
dnf makecache # no file list downloaded
dnf repoquery -C /etc/dnf/dnf.conf - With cache only - nothing found, fail, or what dnf should provide?

It should say an error that it would download filelists.xml, but --cacheonly is set.

The most of commands will have similar problems:
1. We cannot predict whether filelist will be required in advance

Do you really have to?

  1. We don't know whether downloading of filelist will help to resolve the request

This is fine.

  1. Fail of downloading of filelist could result in re-downloading complete metadata from repository, recreating of sack - a lot of CPU is used, query all arguments again (start from the beginning).

Why would it? You need to download only filelists.xml, nothing more. You don't need to re-create pool, you just need to load additional metadata into it.


By the way, FUS (the solver used in creation of hybrid repos RHEL is using) has this functionality for more than 1 year: https://github.com/fedora-modularity/fus/commit/9f9e712bab91af97f9503950363484fe903726ae

Since this is a request for FESCo to reach out to dnf team and we just did, I have no idea whether this requires more action from FESCo. I'd like to have this too, but it's not like FESCo can use some kind of superpowers to make it happen.

Since this is a request for FESCo to reach out to dnf team and we just did, I have no idea whether this requires more action from FESCo. I'd like to have this too, but it's not like FESCo can use some kind of superpowers to make it happen.

I understand. Thanks for reaching out to the dnf team on FESCo's behalf.

The question is, where to continue this discussion and hopefully persuade the dnf team to improve the sad situation I described in my previous comment?
Perhaps in dnf's lazy loading RFE?

@ignatenkobrain, thanks for your remarks. I think it would be useful to copy them to the dnf's lazy loading RFE or some place where we want to continue this discussion.

Metadata Update from @sgallagh:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata