PR#80: Build an in-memory cache for whatprovides, issue #51 - modularity/fedmod

modularity / fedmod

#80 Build an in-memory cache for whatprovides, issue #51

Closed 5 years ago by nphilipp. Opened 5 years ago by karsten.

modularity/ karsten/fedmod inmemorywpcache into master

use local cache (otaylor)

Karsten Hopp • 5 years ago

d212dd2

Build an in-memory cache for whatprovides, issue #51

Karsten Hopp • 5 years ago

93558b6

src/_fedmod/_depchase.py

file modified

+17 -12

		`@@ -81,21 +81,26 @@`
		`def _get_dependency_details(pool, transaction):`
		`candq = transaction.newpackages()`
		`result = {}`
		`+ cache = {}`
		`for p in candq:`
		`pkg_details = {}`
		`for dep in _iterate_all_requires(p):`
		`- matches = set(s for s in candq if s.matchesdep(solv.SOLVABLE_PROVIDES, dep))`
		`- if not matches and str(dep).startswith("/"):`
		`- # Append provides by files`
		`- # TODO: use Dataiterator for getting filelist`
		`- matches = set(s for s in pool.select(str(dep), solv.Selection.SELECTION_FILELIST).solvables() if s in candq)`
		`- # It was possible to resolve set, so something is wrong here`
		`- assert matches`
		`- # While multiple packages providing the same thing is rare, it's`
		`- # the kind of duplication we want fedmod to be able to help find.`
		`- # So we always return a list here, even though it will normally`
		`- # only have one entry in it`
		`- pkg_details[str(dep)] = sorted(str(m) for m in matches)`
		`+ if dep in cache:`
		`+ matches = cache[dep]`
		`+ else:`
		`+ matches = set(s for s in candq if s.matchesdep(solv.SOLVABLE_PROVIDES, dep))`
		`+ if not matches and str(dep).startswith("/"):`
		`+ # Append provides by files`
		`+ # TODO: use Dataiterator for getting filelist`
		`+ matches = set(s for s in pool.select(str(dep), solv.Selection.SELECTION_FILELIST).solvables() if s in candq)`
		`+ # It was possible to resolve set, so something is wrong here`
		`+ assert matches`
		`+ cache[dep] = matches`
		`+ # While multiple packages providing the same thing is rare, it's`
		`+ # the kind of duplication we want fedmod to be able to help find.`
		`+ # So we always return a list here, even though it will normally`
		`+ # only have one entry in it`
		`+ pkg_details[str(dep)] = sorted(str(m) for m in matches)`
otaylor commented 5 years ago This should be outside the indented block.
		`result[str(p)] = pkg_details`

		`return result`

karsten commented 5 years ago

build an in-memory cache similar to https://github.com/fedora-modularity/depchase/pull/17
Speeds up a rpm2modulend run with 2800 packages from 2'15'' to 1'15.
I don't think cache invalidation is necessary, the repodata does not change during the execution of fedmod.

otaylor commented 5 years ago

The codelooks good as far as my knowledge of libsolv goes (not very far) - and looks the same as was done for depchase.

For my usage, the cache produces a noticeable, but not extreme, slowdown - for a test of 247 packages:

time fedmod resolve-deps --json mesa-libGL vulkan xcb-util-image flac-libs libvdpau mpfr dbus-libs geoclue2-libs grep libSM p11-kit-trust libpng glibc-common util-linux curl libgcrypt SDL2_net librsvg2 xz-libs cups-libs compat-libicu57 gstreamer1-plugins-bad-free libglvnd-glx libnsl file SDL2 gobject-introspection gtk-update-icon-cache popt compat-libvpx4 xz alsa-lib at-spi2-atk libwayland-cursor libthai sqlite-libs at-spi2-core procps-ng bash libXinerama ncurses libXcursor python3-libproxy pcre-cpp llvm-libs libatomic libXrandr pulseaudio-libs bzip2 fontconfig librsvg2-tools libxml2 rpcgen which libXfixes SDL2_mixer atk krb5-workstation libtasn1 compat-readline6 libffi xz-lzma-compat less libXtst mlocate xcb-util-renderutil libcom_err nss-tools acl expat elfutils-libs libXft python2 glibc harfbuzz libXpm gstreamer1 bzip2-libs libXt libksba libXdamage coreutils libarchive SDL2_image libmount p11-kit python2-libproxy libacl libproxy gnupg2-smime SDL2_ttf elfutils libjpeg-turbo libxshmfence mesa-libEGL eosrei-emojione-fonts libglvnd npth python2-libs turbojpeg dejavu-sans-mono-fonts libattr nss-softokn libcroco openal-soft libepoxy ibus-libs libXv libcurl compat-gdbm libgcab1 libgcc krb5-server libxkbcommon-x11 nss-softokn-freebl libvorbis pcre2-utf16 xdg-utils libxslt libXScrnSaver libsndfile python3 tar libverto libdatrie gawk mesa-libgbm libstdc++ libXrender libxcrypt dbus cairo-gobject libglvnd-opengl libXext libXau cpio gmp libdrm mesa-libglapi ocl-icd libX11-xcb libgomp elfutils-libelf libX11 freetype gdbm-libs findutils libXdmcp libtheora xcb-util-cursor nss-util libXi gnu-free-serif-fonts libsoup unzip libwayland-client libseccomp libXcomposite xcb-util-wm pcre2 dbus-x11 cairo libwebp dejavu-sans-fonts python2-libxml2 libglvnd-egl libmpc xcb-util-keysyms pango compat-giflib glib2 hunspell gtk3 google-crosextra-carlito-fonts hyphen libcap json-glib libglvnd-gles pcre speexdsp libblkid krb5-libs zlib cyrus-sasl-lib mesa-vulkan-drivers libXxf86vm zip dejavu-serif-fonts libsamplerate libkadm5 orc dconf speex cracklib nss gnutls file-libs liberation-mono-fonts python2-setuptools libidn libgpg-error graphite2 pulseaudio-libs-glib2 pcre2-utf32 mpg123-libs libuuid liberation-sans-fonts libogg openssl sed pixman attr gpgme libxkbcommon liberation-serif-fonts lcms2 gnupg2 lcms2-utils libwayland-server ncurses-compat-libs pinentry zenity xcb-util gstreamer1-plugins-base gnu-free-mono-fonts libtiff harfbuzz-icu pulseaudio-utils libpciaccess libwayland-egl gzip gnu-free-sans-fonts libexif xdg-user-dirs mythes nettle gdk-pixbuf2 aspell libxcb gnupg libtool-ltdl google-crosextra-caladea-fonts libassuan ncompress nspr libICE libappstream-glib

The timing goes from around 5s to around 7s, and the overall process of regenerating the flatpak-runtime package set goes from 45s to 55s. It seems that in some cases, the cost of looking across the entire pool for requirements exceeds the savings when the same requirement is looked up twice. Is the 2800 package case actually typical?

Since for 'resolve-deps --json' _get_dependency_details() is called only once, I get much better results when I do a local cache:

        cache = {}
[...]
            if dep in cache:
                matches = cache[dep]
            else:
                matches = set(s for s in candq if s.matchesdep(solv.SOLVABLE_PROVIDES, dep))
                if not matches and str(dep).startswith("/"):
                    # Append provides by files                                                                                                                
                    # TODO: use Dataiterator for getting filelist                                                                                             
                    matches = set(s for s in pool.select(str(dep), solv.Selection.SELECTION_FILELIST).solvables() if s in candq)
                # It was possible to resolve set, so something is wrong here                                                                                  
                assert matches
                cache[dep] = matches

that drops the 247 package case from 5 seconds to 2.5 seconds. I suspect it's likely to do better with your test case case well.

The global cache might have win if you ran module generation with verbose logging before https://pagure.io/modularity/fedmod/c/8e4fb14d40ec81863867e9fc7e0a97afb88e6c98?branch=master - where there are multiple calls to _get_dependency_details().

karsten commented 5 years ago

I can reproduce this with your package list, real and user times increases slightly, system time seems to be a little lower with lru_cache.

Your version is much faster with a huge number of packages.
The normal version takes 2'15'', the lru_cache version 1'15'' and your local cache takes a mere 17' seconds.
I'll prepare a new patch with your version, thanks a lot for that suggestion !

1 new commit added

use local cache (otailor)

5 years ago

2 new commits added

use local cache (otaylor)
Build an in-memory cache for whatprovides, issue #51

5 years ago

nphilipp commented 5 years ago

use local cache (otaylor)

@karsten, @otaylor, is Owen the author of the commit in question? If so, git lets you make commits on behalf of the actual author with say git commit --author='Owen W. Taylor <otaylor@fishsoup.net>' instead of putting the attribution into the log comment.

Besides, it looks as if Owen's change largely undoes your implementation, so do you want me to squash the two commits into one?

karsten commented 5 years ago

Owen is the Author, he didn't create a PR but provided the code in a comment right here.
Pls squash, we don't need that other commit

otaylor commented on line 36 of src/_fedmod/_depchase.py 5 years ago

This should be outside the indented block.

otaylor commented 5 years ago

I'm not fussed about the attribution of a couple of lines of code, but if it would be easier, I can certainly stick a branch somewhere with a commit on it.

otaylor commented 5 years ago

https://pagure.io/fork/otaylor/modularity/fedmod/branch/provider-cache

nphilipp commented 5 years ago

Thanks! This is solved in commit 4d8917a in the master branch.