| |
@@ -265,7 +265,7 @@
|
| |
for arch in arch_list:
|
| |
self.rpms_by_arch.setdefault(arch, [])
|
| |
for i in other.rpms_by_arch.get(arch, []):
|
| |
- if i.file_path in self.file_cache:
|
| |
+ if i.file_path in self.file_cache.file_cache:
|
| |
# TODO: test if it really works
|
| |
continue
|
| |
if inherit_to_noarch and exclusivearch_list and arch == "noarch":
|
| |
The pkgset phase takes around 35 minutes in current composes.
Around 20 minutes of that is spent creating these per-arch
subsets of the global package set. In a rather roundabout way
(see #1794 ), I figured out that almost all of this time is
spent in this cache check, which is broken for a subtle reason.
Python's
in
keyword works by first attempting to call thecontainer's magic
__contains__
method. If the container doesnot implement
__contains__
, it falls back to iteration - ittries to iterate over the container until it either hits what
it's looking for, or runs out. (If the container implements
neither, you get an error).
The FileCache instance's
file_cache
is a plain Python dict.dicts have a very efficient
__contains__
implementation, sodoing
foo in (somedict)
is basically always very fast no matterhow huge the dict is. FileCache itself, though, implements
__iter__
by returning an iterator over thefile_cache
dict'skeys, but it does not implement
__contains__
. So when we dofoo in self.file_cache
, Python has to iterate over every keyin the dict until it hits foo or runs out. This is massively
slower than
foo in self.file_cache.file_cache
, which uses theefficient
__contains__
method.Because these package sets are so huge, and we're looping over
one huge set and checking each package from it against the cache
of another, increasingly huge, set, this effect becomes massive.
To make it even worse, I ran a few tests where I added a debug log
if we ever hit the cache, and it looks like we never actually do -
so every check has to iterate through the entire dict.
We could probably remove this entirely, but changing it to check
the dict instead of the FileCache instance makes it just about as
fast as taking it out, so I figured let's go with that in case
there's some unusual scenario in which the cache does work here.
Signed-off-by: Adam Williamson awilliam@redhat.com