#7076 Brainstorming about possible git checkout seed improvements
Closed: Will Not/Can Not fix 5 years ago Opened 5 years ago by tibbs.

The make-git-checkout-seed.sh script (https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/git/make_checkout_seed/files/make-git-checkout-seed.sh) runs daily at 0200 to create the git checkout seed and the specfile tarball (https://src.fedoraproject.org/repo/git-seed-latest.tar.xz and https://src.fedoraproject.org/repo/rpm-specs-latest.tar.xz).

Currently the git seed is nearly 10G compressed (> 24G exploded) and contains nearly six million files and directories.

Since this is big enough to put it out of the range of usefulness for many people, I was wondering if providing a shallow clone would be better. Just a depth 1 clone appears to be about a quarter of the size, but since it contains neither history nor any of the branches it might not be sufficiently useful.

So my basic question is whether it is reasonably possible to generate a checkout which:

  • has at least a little history
  • has all of the live branches
  • is at least somewhat smaller
  • isn't too much slower or more difficult to generate than the current checkout

The clone is generated at line 54 of the script. It should be pretty trivial to just pass --depth 5 --no-single-branch to that to get a somewhat smaller checkout.

Things I'm thinking about:

  • --depth N --no-single-branch (the simple case)
  • Using --shallow-since and asking for a year or so of history. If there is no history recent enough, current git will fetch all history. Git 2.19 (not yet released) this will error in that case, so it would need to be handled. (Just fall back to --depth)
  • Using --depth N and manually fetching each live branch. But how can we get a list of live branch in a shell script?

Can you easily convert a shallow clone to a full one?

I am not sure what all uses people have for this... we just provided it in the hope it would be helpful.

We do have grokmirror now too, but that still needs a large upfront download and time commitment to get copies of everything, so the seed is still likely useful.

If we don't decide anything here, I have added the meeting keyword and we can talk about it in the next meeting.

Metadata Update from @kevin:
- Issue priority set to: Next Meeting (was: Needs Review)

5 years ago

Metadata Update from @kevin:
- Issue priority set to: Waiting on External (was: Next Meeting)

5 years ago

@tibbs - is this still needed? Have you looked at grokmirror as a way to fulfill your request?

So the idea was to help with the case that you want to do cleanup on a bunch of packages. The checkout seed is huge and takes forever to unpack, while the spec tarball is fast but not useful because you can't even run 'fedpkg prep'.

But the fact that you can't push from a shallow clone means that you have to do git fetch --unshallow for the packages which need changes. Which... isn't much different from just getting the spec tarball and then doing a clone when you need to. And if you just want to keep the full checkout, you might as well just run grokmirror.

So, I guess I'll just close this.

Metadata Update from @tibbs:
- Issue close_status updated to: Will Not/Can Not fix
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata