#915 RFE: Support configurable prefixes for additional splitting in dist-repos
Opened 5 years ago by ngompa. Modified 4 years ago

In #914, Koji reorganized how dist repos are organized so that packages are in subfolders based on the first letter of the package name.

However, in some cases, this is still not enough for comfortably spreading out packages. For example, in Mageia Cauldron, here's what it looks like:

$ cat cauldron-rpms | cut -c 1|sort|uniq -c|sort -n|tail
    715 o
    833 j
    899 n
    954 a
   1008 r
   1356 s
   1452 g
   1500 m
   6859 l
   7340 p

For l, this is because Mageia's policy on libraries is such that each library is in its own package. Thus, we have lib and lib64 prefixed packages. And some distributions also have libx32 prefixed packages.

For p, this is largely because of Perl and Python packages. There's a lot of perl-, python-, python2-, and python3- prefixed packages.

Having further alphabetical splits for these would help make it easier for partial syncs, individual scanning and downloading of packages, and so on.

But because distributions/projects tend to have different requirements here for further splits, it likely makes sense to offer a way to configure additional prefixes to split and organize packages.

Note this really doesn't affect repodata much, due to the nature of the format.


Metadata Update from @mikem:
- Issue tagged with: discussion

5 years ago

From #914:

This probably needs to be either 1) a configurable regex, or 2) dynamically split based on the content in the repo.

The first letter layout is used here and elsewhere because it is simple and predictable.

Option 1 is pretty simple to code, but requires human maintenance. Option 2 is more complicated to code but does not.

Either method makes it harder for a 3rd party script to predict the path of an rpm. However, that is not really a requirement here (I don't think). 3rd party scripts should be using the repo metadata.

Either method makes it harder for a 3rd party script to predict the path of an rpm. However, that is not really a requirement here (I don't think). 3rd party scripts should be using the repo metadata.

Yeah, this is about humans interacting with the file tree directly.

@ngompa Can you comment on what you mean by "partial syncs"? That doesn't sound like something we should specifically support.

@mikem: People are individually requesting a range of packages to download via rsync for various reasons. In Mageia, people do this to fetch specific packages from our updates_testing repo.

Some people also want to be able to fetch specific source packages for mass-rebuilding down for older distribution releases or other distributions entirely.

I think if we address this, it should be option 2. No need to add yet more obscure config options.

@ngompa please describe the use cases a bit more. This feels odd to me as I'm having trouble picturing someone manually browsing a repo with thousands of packages.

Metadata Update from @dgregor:
- Custom field Size adjusted to None

4 years ago

@dgregor I described this earlier in this thread, but sure...

It's very common for folks in the Mageia community to manually browse through a repo and pluck out source packages for rebuilding or individual binary packages for testing. Some of this probably could be alleviated if repoview still worked (it needs porting from yum) and was a feature of dist-repos...

There's also the issue of having so many files in a directory that enumerating them may cause issues on the system. I've had my share of problems related to that before and it'd be nice if we could not have that by having further splits...

Login to comment on this ticket.

Metadata