#259 Lookaside cache for binaries forces fragmentation and duplication
Opened 6 months ago by malmond. Modified 6 months ago

I'm working on the Hyperscale SIG. I have access to CentOS build infra and part of the puzzle is providing binaries to complement sources.

The namespacing for artifacts includes the package name, and even the branch. I'm testing out using forks and PR's to propose changes to the SIG. Example:

https://git.centos.org/rpms/unbound/pull-request/1

To build this during test, I've found I need to satisfy the following URL:

https://git.centos.org/sources/unbound/simple_bump_1.13.1/561522b06943f6d1c33bd78132db1f7020fc4fd1

I have permission to upload to existing bookmarks, e.g. c8s-sig-hyperscale-experimental but if I try: lookaside_upload -f SOURCES/unbound-1.13.1.tar.gz -n unbound -b simple_bump_1.13.1 I get a 403:

[+] CentOS Lookaside upload tool -> Checking if file already uploaded
[+] CentOS Lookaside upload tool -> Initialing new upload to lookaside
[+] CentOS Lookaside upload tool -> URL : https://git.centos.org
[+] CentOS Lookaside upload tool -> Source to upload : SOURCES/unbound-1.13.1.tar.gz
[+] CentOS Lookaside upload tool -> Package name: unbound
[+] CentOS Lookaside upload tool -> sha1sum: 561522b06943f6d1c33bd78132db1f7020fc4fd1
[+] CentOS Lookaside upload tool -> Remote branch: simple_bump_1.13.1
[+] CentOS Lookaside upload tool ->  ====== Trying to upload =======

########################################################################################################################################################################################################################## 100.0%
curl: (22) The requested URL returned error: 403 Forbidden

[+] CentOS Lookaside upload tool -> [ERROR] Something didn't work to push to https://git.centos.org/sources/unbound/simple_bump_1.13.1/561522b06943f6d1c33bd78132db1f7020fc4fd1
[+] CentOS Lookaside upload tool -> [ERROR] Verify at the server side

I am okay with this not working. I would like to see the url space squashed down - the worst part is the 'branch name' because I can't run scratch builds using for PR using the fork.

Even before using the PR workflow, I find I have to repeatedly upload sources for each branch, e.g. https://git.centos.org/sources/rpm/c8s-sig-hyperscale-experimental/3f8c3ef08f93eaeef12008055a43f6872306f8a2 and https://git.centos.org/sources/rpm/c8s-sig-hyperscale/3f8c3ef08f93eaeef12008055a43f6872306f8a2 even if https://git.centos.org/sources/rpm/c8s/3f8c3ef08f93eaeef12008055a43f6872306f8a2 exists.


As described on SIGGuide (https://wiki.centos.org/SIGGuide#SIGGuide.2FContent.2FImport.Pushing_first_to_lookaside_cache), yes, (and as you found out) you have to specify the branch (-b) and of course be member of the SIG group for that branch.
So yes, (initial and legacy decision I guess), that means that you don't push new sources, they still need to exist in different directories so need to be pushed

So apart from the need to push same source to different branch, can you explain what the real issue is ? based on feedback I'll just tag with it being issue or just RFE

Metadata Update from @arrfab:
- Issue priority set to: Waiting on Reporter (was: Needs Review)

6 months ago

Metadata Update from @arrfab:
- Issue tagged with: need-more-info

6 months ago

The main issue is that I can't use arbitrary names for branches in forks, e.g.

cbs build --scratch hyperscale8s-packages-experimental-el8 git+https://git.centos.org/forks/malmond/rpms/unbound.git#c4c3b84c448b362a89388520d59d81dc036c4f33

(sample job https://cbs.centos.org/koji/taskinfo?taskID=2145464)

I know there's other problems with using PRs (see #228), but even if that isn't fixed, being able to use CBS to do scratch builds remains useful as part of a development workflow. It'd be useful to fall back on serving any content within the package with the right hash, when the specific branch doesn't explicitly exist. That would allow .spec only changes testing to be simpler.

I guess what I'm suggesting is that either the https backend do this transparently, or the client (get_sources.sh) do this. The main question in both cases is "which branch" - The choices are either the default branch or the tracked branch for the current one.

Longer term: would you consider merging the storage of of binaries? You could still honor the URL contracts, which would provide usage data. The only down side to that, and the current system is if caches are used: you'd potentially cache the same data in different urls.

well, cbs/koji will not designed to build from forked projects either (you're the first one trying that TBH). SIGs are (for --scratch builds) trying directly src.rpm.
Once it's correctly pushed to main project, I guess it's working ?
As said to @dcavalca , it would be worth discussing that in a real meeting on irc , but not on issue tracker. Unfortunately, with all moving targets in the infra, I don't have time for this now. Let's try to schedule that in some weeks ?

Thanks for the suggestion. Building a src.rpm from my local sources and rebuilding in Koji worked fine, so I'm largely unblocked.

I'm keen on keeping this issue open because I think the logical end-game for this is to simply use content identity as the primary key. I looked at the equivalent mechanism in Fedora: fedpkg sources and fedpkg new-sources. They don't appear to need to key on branch or possibly even package name. I imagine the only reason you'd care is to do accounting and garbage collection, right?

Login to comment on this ticket.

Metadata