#550 Have koji store the git hash
Closed: Fixed 6 years ago Opened 6 years ago by pingou.

A build in koji can be triggered from a scm_url that follow many different format, for examples:
- git://pkgs.fedoraproject.org/<namespec>/<package>?#<git hash>
- git://pkgs.fedoraproject.org/<namespec>/<package>?#<branch>
- git://pkgs.fedoraproject.org/<namespec>/<package>

For audibility and reproducibility purposes could we make koji store (and give access to) the commit hash that corresponds to a certain build ?

This would also allow bridging the work done in the CI pipeline in Fedora that triggers on dist-git commits and thus have this information and system like bodhi that only know about NVRA.

Thanks! :)


Koji already stores the full scm_url as part of the build task. So it looks like you need a way to obtain that hash through API calls?

The issue is that the scm_url doesn't necessarily tell you which commit was used in the build, cf the examples given above.

I think for auditibility and reproducibility we should store the commit hash and adjust the scm_url to ensure it always contain it.

This is now considered a blocker for the Fedora Atomic CI integration as without a clear way to map a build to a git hash we can reliably re-trigger a test run.

This was discussed in: https://github.com/CentOS-PaaS-SIG/ci-pipeline/issues/382

It also influences the integration of these results in bodhi which only know the koji build while the Atomic CI results are linked to a git hash.

Rather than investing engineering effort here and on Bodhi, why don't we do the work to get CI results reported into PRs on Pagure? If we do this ticket and make the related changes in Bodhi, the overall CI system still won't make sense - telling users there were CI failures in Bodhi is far too late in the process - we should tell them that proposed patches will lead to CI failures instead. IMO, we would be investing engineering resources into a system that doesn't meet the needs if we go this way, because the end result would not be a CI system.

Pagure is also a much more natural fit because it already knows commit hashes and because it's the point in the workflow where developers should be informed about failures due to their patches.

The Atomic CI effort is one side of the argument and because of the way things are currently designed/being worked on, this ticket has become a blocker.

However, I still think this change is good for koji regardless of the Atomic CI initiative, but just for being able to precisely link a build to a commit and thus improve the auditability and reproducibility aspects :)

Information is saved there in checkout.log for every SRPM generated from git. So, yes information is there and no, there is no solid API for getting it.

(Very) crude proof of concept :-)

s = koji.ClientSession('https://koji.fedoraproject.org/kojihub')
build = s.getBuild(980126)
tasks = s.getTaskChildren(build['task_id'])
task = [t['id'] for t in tasks if t['method'] == 'buildSRPMFromSCM'][0]
log = s.downloadTaskOutput(task, 'checkout.log')
l = [x for x in log.split('\n') if 'HEAD is now' in x][0]
print(l.split()[4])

Alright that does work, I double-checked with a build against a branch rather than a commit: https://koji.fedoraproject.org/koji/buildinfo?buildID=980129 (cf the Source line in https://koji.fedoraproject.org/koji/taskinfo?taskID=22264235 )

Basically, my hope was to see that git hash happened into the Source line there (ie: replace #master by #2e89115 for this build), even more so after finding out that this checkout.log doesn't seem to be kept :(

Yep, it is just quick hack if those data are needed now.

I don't want to modify that line - as it is the original argument with which it was called. I was thinking about storing it maybe in build's 'extra' information, so it would be easily retrieved.

@mikem ideas?

I was thinking about storing it maybe in build's 'extra' information, so it would be easily retrieved.

Cool :)

We have to consider that Koji supports multiple SCMs, even though git is clearly the best :stuck_out_tongue_winking_eye:
We also have to consider non-rpm builds.

I have wanted to do something more with the source for a long time, but this is a tricky thing to get right.

I concur that we need to preserve the source url that was specified for the build, and of course we want the actual ref for git sources.

Ultimately, we're going to have to grow the ability to record more complicated data about the source. Cramming this data in build.extra may work in the short term, but it doesn't seem correct.

I will have to think more on it.

AIUI this issue has become a blocker for the folks working on integrating the CI pipeline with Fedora infra. Hopefully @mikem has some idea on how to move forward here?

I think should be able to get some sort of solution into 1.15.

Metadata Update from @mikem:
- Issue set to the milestone: 1.15

6 years ago

Cool, that's promising. What's the ETA for 1.15?

Expecting 1.15 in December

@pingou -- is that workable from the Atomic/CI side?

@pfrields, we'll make it work, there are enough pieces to work on :)

Question from PR: I'm now storing git url for rpms coming from git (same way as OSBS does). What about others? Does it make sense to store srpm nvr/filename in case of uploaded srpm or its hash?

Login to comment on this ticket.

Metadata