#7265 blender git repo contains large files causing git checkout to timeout during builds
Closed: Fixed 3 years ago by pingou. Opened 6 years ago by hobbes1069.

From checkout.log:
$ git clone -n https://src.fedoraproject.org/rpms/blender.git /var/lib/mock/f28-build-11036148-838686/root/tmp/scmroot/blender
Cloning into '/var/lib/mock/f28-build-11036148-838686/root/tmp/scmroot/blender'...
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header
$ git rev-list --objects --all | grep "\.rpm\|\.gz"
6115a22040643818e89404f301faa1d4a4ce4a56 results_blender/2.77a/1.fc23/blender-2.77a-1.fc24.src.rpm
2340a78e377e89c3a3b751a997ce0a563a9714ff results_blender/2.77a/1.fc23/blender-2.77a-1.fc24.x86_64.rpm
7966c177500299784bcaa963381f14b74eddc108 results_blender/2.77a/1.fc23/blender-debuginfo-2.77a-1.fc24.x86_64.rpm
014423f48c5697e0c256cc93c789823800883d85 results_blender/2.77a/1.fc23/blender-rpm-macros-2.77a-1.fc24.noarch.rpm
e2e943ce223ac62904fde4717ceac8e4f3387018 results_blender/2.77a/1.fc23/blenderplayer-2.77a-1.fc24.x86_64.rpm
e3c53ce853edd37b76bc5b4d422f85b14a1a10c5 results_blender/2.77a/1.fc23/fonts-blender-2.77a-1.fc24.noarch.rpm
5b6716c806f9ef18beaeed55beb07b2c162179b6 results_blender/2.77a/1.fc26/blender-2.77a-1.fc26.src.rpm
12a5eee6eb79f3ec8c257cc4b02a16e1c0774966 results_blender/2.77a/1.fc26/blender-2.77a-1.fc26.x86_64.rpm
8a7e6d7dc970f5b1fdd66e9a9e57e814db4de9a2 results_blender/2.77a/1.fc26/blender-debuginfo-2.77a-1.fc26.x86_64.rpm
ac7b7d731e06bf30deb8cfcbbf2a7002b3cef7c4 results_blender/2.77a/1.fc26/blender-rpm-macros-2.77a-1.fc26.noarch.rpm
9b783965dbe33cdc8a670a70d5212e62442094ec results_blender/2.77a/1.fc26/blenderplayer-2.77a-1.fc26.x86_64.rpm
dee39b967a93774562690fc7a898013bbdd831e5 results_blender/2.77a/1.fc26/fonts-blender-2.77a-1.fc26.noarch.rpm
7bffb5068eb55df30d688d8608cd9264a77c8cf3 results_blender/2.78/1.fc26/blender-2.78-1.fc26.src.rpm
c89795526e03033ff0255495f94ced42cea86271 results_blender/2.78/1.fc26/blender-2.78-1.fc26.x86_64.rpm
12907ae1397483d284498820e3fac62ad7e17c07 results_blender/2.78/1.fc26/blender-debuginfo-2.78-1.fc26.x86_64.rpm
b518a731b7f38961433cbf8f6b2b9c4716fce785 results_blender/2.78/1.fc26/blender-rpm-macros-2.78-1.fc26.noarch.rpm
4cd2e44e3562eb77ddf4c3193e80f69389469090 results_blender/2.78/1.fc26/blenderplayer-2.78-1.fc26.x86_64.rpm
6459a4e4cf1ac226117b091ddd7a59bec6cfc6b5 results_blender/2.78/1.fc26/fonts-blender-2.78-1.fc26.noarch.rpm
30bb773c1b4242fdb48b33788e53374a4a900e14 blender-2.77a-1.el7.src.rpm
a6c477b254d54ebe07ed11df6b2ec41e132ddded blender-2.77a-1.fc23.src.rpm
a34c5c49de2fcd378e3d7f74cecfc1dcf77082af blender-2.77a-1.fc26.src.rpm
e17f27a4415c31a747f76e10a68e592bf5d25b47 blender-2.78-1.el7.src.rpm
0e7cb8ad4bf7571ac1123c853e5636854e622124 blender-2.78-1.fc26.src.rpm
a4a9f0f6a24e622799cfefccf1804cae48cdfcae blender-2.78.tar.gz

Can these files be removed?


In order to delete anything we need to have approval from FESCo (who has not wanted to do so in the past).

I did a 'git gc' on the repo and also set postBuffer = 10485760 (10mb instead of the default of 1).

Can you see if thats enough to get it building?

I'll give it a shot but these files really have no business being in git regardless.

We'll see if it builds, but the git checkout succeeded.

Forgot to come back and update the ticket but the build was successful.

We have to leave them there, we can not go and rewrite history of the git repos.

FESCo discussed this issue on 2018-03-02 and gave the green light:

AGREED: Move branches with huge commits to refs/archive/ and create new sanitized branches (+6, 0, 0)

The idea is to move the bad heads to refs/archive/, so that they will remain in the repo, will not be garbage collected by git, but will not be cloned by git by default. I.e. the repo will still be large, but only on pagure, and cloning will be fast.

Can we get this implemented?

Metadata Update from @syeghiay:
- Issue assigned to mohanboddu

5 years ago

I got hit by this again when trying to fix a FTBFS for blender. Can we please get this done?

What needs to be done here? How can I help? Working with the blender package is tedious.

@hobbes1069 , apologies that this is still an issue. It will be on this week's Releng IRC meeting agenda.

@hobbes1069 , apologies that this is still an issue. It will be on this week's Releng IRC meeting agenda.

@syeghiay Assuming all went as planned, could you please update us on the outcome of that discussion?

@churchyard @syeghiay @ferdnyc This was reported for blender more than a year ago indeed.

Metadata Update from @mohanboddu:
- Issue tagged with: meeting

3 years ago

From FESCo meeting on 2018-03-02

16:12:38 <zbyszek> #topic #1848 Request to Authorize Removal of Blender Source Tarballs from Incorrect Place in Repository
16:12:42 <zbyszek> .fesco 1848
16:12:44 <zodbot> zbyszek: Issue #1848: Request to Authorize Removal of Blender Source Tarballs from Incorrect Place in Repository - fesco - Pagure - https://pagure.io/fesco/issue/1848
16:12:44 <zbyszek> https://pagure.io/fesco/issue/1848
16:13:43 <bowlofeggs> i'm +1 as long as we archive the old repo
16:14:09 <zbyszek> +1 to the same
16:14:10 <sgallagh> Perhaps a better solution here would be for us to rename the git repo and make a shallow copy of the HEAD without the tarballs in it
16:14:18 <puiterwijk> Just a random idea: we can just rename the current refs to something that does not live in refs/heads, so won't be cloned automatically
16:14:19 <jsmith> I'm +1
16:14:29 <sgallagh> Essentially, restart the repo history from today.
16:14:39 <puiterwijk> Which means that we don't have to remove anything, and people can still "git fetch <commithash>" from the same location
16:14:51 <zbyszek> sgallagh: That'd have the same issue, that you cannot use an old commit hash
16:14:55 <nirik> I guess I'm a weak +1 if we archive the old repo and document where we archive such things and have a SOP for this process.
16:14:59 <puiterwijk> So basically set tags for the current state of things, and then rewrite the branches.
16:15:17 <bowlofeggs> puiterwijk: that's a good idea - i didn't realize git clone didn't pull down things not referenced by HEAD
16:15:18 <zbyszek> puiterwijk: I checked that, but if we make a tag, tags are clone dby default
16:15:22 <sgallagh> Wouldn't they still get everything in a clone
16:15:28 <puiterwijk> zbyszek: right. I meant a tag outside of refs/heads
16:15:28 <sgallagh> So a ridiculously-large checkout?
16:15:35 <puiterwijk> sgallagh: not if you stuff it in refs/archive/
16:15:41 <sgallagh> interesting...
16:15:46 <puiterwijk> git has a clone default of refs/heads/* and refs/tags/* if I recall correctly
16:15:47 <tyll_> +1 to puiterwijk's idea
16:15:56 <puiterwijk> That's how Github does their automagic refs/pull/$somenumber
16:16:03 <tyll_> also sounds like a good idea for all old branches IMHO
16:16:05 <puiterwijk> (and for those who didn't know: that even works with Pagure!)
16:16:14 <tyll_> or at least ancient branches
16:16:16 <jsmith> I'm +1 for puiterwijk's suggestion
16:16:37 <zbyszek> That's also acceptable
16:16:40 <bowlofeggs> yeah i'm +1 to puiterwijk's suggestion (and educated by it too!)
16:16:54 <puiterwijk> bowlofeggs: it doesn't depend on HEAD, it depends on refs/heads/ and refs/tags/ being the default clone refs :)
16:16:56 <nirik> I don't think we want to do it for all old branches tho
16:17:16 <bowlofeggs> yeah i think we just need it for this case
16:17:18 <puiterwijk> Yeah, I don't know about all old branches, I'd keep that for another discussion.
16:17:36 <tyll_> yes, it should be another discussion
16:18:18 <zbyszek> So the nice thing is that if we decide to drop the old history, we can always still do that, this time by dropping the ref in refs/archive/.
16:18:32 <zbyszek> So, I'm moving my +1 to puiterwijk's proposal
16:19:13 <jkosciel> a
16:19:46 <sgallagh> +1 puiterwijk
16:20:19 <zbyszek> nirik: ?
16:20:40 <nirik> +1 yes for this case
16:20:52 <nirik> and we should document this (releng sops? wiki?)
16:23:18 <puiterwijk> For those who want to learn more: git refspec is the term you're looking for (https://git-scm.com/book/en/v2/Git-Internals-The-Refspec), which defaults to "refs/heads/*:refs/remotes/origin/*"
16:23:25 <zbyszek> #agree Move branches with huge commits to refs/archive/ and create new sanitized branches (+6, 0, 0)
16:23:56 <zbyszek> OK?
16:24:41 <bowlofeggs> yeah
16:24:50 <nirik> yep

@churchyard Do you still want to help on this?

I do, but I have no powers.

@churchyard what powers do you need? Access to the bare repo / files?

Are there minutes from the February weekly Releng meeting that @syeghiay mentioned? (Whichever meeting was the next one after 2020-02-25, I guess.)

It sounded like that's where an actual plan to move this forward was going to be discussed.

Access to the bare repo should do. Preferably on staging first, not to screw up.

@churchyard I can create a tarball of the git repo where you can practice, once ready, I'm happy to apply the command you give me (in staging) to validate this and then we can look at prod

Note: we could try to address https://pagure.io/fedora-infrastructure/issue/7467 while at it

Nevermind, this is actually the same issue :)

Please, do. Thanks. It's big, probably upload it somewhere?

Please, do. Thanks. It's big, probably upload it somewhere?

It's big and it's in your home folder on fedorapeople :)

It's big and it's in your home folder on fedorapeople :)

Do I still have quota for this? :)

BTW, would it be worthwhile having a git hook to prevent this? It's probably debatable what "too big" means but the line is probably somewhere between the size of the kernel spec file and that of the blender tarball. :wink:

Patrick wrote such a hook and we proposed it to fesco and it was rejected. :( I can look up details, but I think people worried about corner cases.

Ideally you'd want an "Are you sure?" type interactive prompt, rather than a hard rejection. (Or at least some option to force the action, from one of those corner cases.) But git isn't really designed for interactive pushes; that's what pull requests are for.

Not to mention, blocking during push is kind of too late. The best time to prevent this would be the point at which the bad commit is initially created in the local history.

Turns out the files have only ever been added to the f23 branch.

[blender.git (BARE:master)]$ pwd
/srv/git/repositories/rpms/blender.git
[blender.git (BARE:master)]$ FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --force --index-filter  'git rm -r --cached --ignore-unmatch results_blender *.src.rpm *.tar.gz' --original refs/archive -- f23
[blender.git (BARE:master)]$ mv refs/archive/refs/heads/f23 refs/archive/f23   # no idea why this is not the default
[blender.git (BARE:master)]$ rm refs/archive/refs/ -r

Metadata Update from @cverna:
- Assignee reset

3 years ago

Hi @churchyard, did you ever figured out the commands that we need to run on the git repo to archive the large files?

Yes, see my latest comment from 8 months ago.

https://pagure.io/releng/issue/7265#comment-652554

So I am confused on the status here. Do we need to have someone run those commands on the bare repo on pkgs01?
Or did @pingou already do that?
Or did @pingou take the repo @churchyard modified and put it in place?
Or did @churchyard just do this on the bare repo and it's done?

Do we need to have someone run those commands on the bare repo on pkgs01?

Yes.

...

No. (You can execute the reproducer from the initial comment. The issue is still there.)

[blender.git (BARE:master)]$ rm refs/archive/refs/ -r

We want to remove all archive/refs?

We want to remove all archive/refs?

8 months ago, this was only created by the command above.

Ok, I've ran these commands and here is my test:

$  git clone -n https://src.fedoraproject.org/rpms/blender.git
Cloning into 'blender'...
remote: Enumerating objects: 3190, done.
remote: Counting objects: 100% (3190/3190), done.
remote: Compressing objects: 100% (1855/1855), done.
remote: Total 3190 (delta 1803), reused 2383 (delta 1298), pack-reused 0
Receiving objects: 100% (3190/3190), 602.01 KiB | 1.11 MiB/s, done.
Resolving deltas: 100% (1803/1803), done.
$ cd blender 
$ git rev-list --objects --all | grep "\.rpm\|\.gz"
e1bab3c012eba64ae7b92a0e14c428567f53e7d7 .rpmlint

Many thanks @churchyard !

Going to close this one as fixed, please let us know if you think otherwise.

Metadata Update from @pingou:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata