git.centos.org is looking to move to the fedora pagure. Currently pagure doesn't have an api that lets me find only the CentOS sources. If I could rsync down the repositories and look aside sources, I could query locally and avoid overloading the Fedora infrastructure.
Before CentOS moves to the fedora pagure
I would find this useful as long as CentOS sources are hosted at Fedora
Without the ability to create a local target, there will be a very large number of queries against the Fedora infrastructure from just the rebuild folks.
I am worried that rsyncing N thousand git repositories would kill us as much as you doing a lot of git clones. Would the the git seed's we provide help more?
These are built daily with this script
That comes fairly close, my overall hope was to setup a local mirror and just sync up changes.
Right now I'm querying the CentOS source API every 3 hours to try and avoid too long of a lag on important errata.
I do not know if repospanner can be used for this but it is what we are using between the two infrastructures to deal with the Fedora/CentOS move
Unfortunately, we will not be able to prove rsync due to the backend systems using a different storage format than normal Git repositories.
As mentioned, we currently provide git seeds.
Do note that with the move to Pagure, there might be an option to get messages published to a message bus, so that you can monitor those.
For Fedora, we already publish that via fedmsg, for CentOS you'd have to ask the CentOS administrators.
Does anyone have an ETA on that move?
Metadata Update from @smooge:
- Issue assigned to smooge
@puiterwijk I'm a bit confused by "Unfortunately, we will not be able to prove rsync due to the backend systems using a different storage format than normal Git repositories."
The scripts that make the git seeds look like they are just doing a tar archive....
Metadata Update from @bowlofeggs:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: src.fp.o
What repospanner keeps on the filesystem isn't really the same format as a raw git repo. I'm honestly not sure how the data get out of repospanner to be tarred up to make the nightly seeds, but you are right that something useful must be written to the filesystem at some point. It's possible that's done on some storage which can't just be opened up to rsync. Honestly rsync is quite terrible for this kind of thing anyway because every rsync run requires a full filesystem stat and I'm not sure the full tree is quiescent often enough for an rsync run to reliably complete.
It would be kind of neat if you could have read-only, transient, unprivileged repospanner nodes that could be used for local git mirrors. I could have used that more than once.
To cover the items on this ticket.
The git seeds are not based on the combined repospanner mode that CentOS/Fedora will work towards using in the coming months. They are using the older gitolite backend format which can be tarred up. A new method will have to be invented.
As tibbs points out, rsync is horrible for git. There are millions of little files which would need to be stat on each rsync.. and constantly being added to. Thus there will not be any resource savings over using git.
At this point we will not be offering rsync on src.fedoraproject.org and will have to rely on git to mirror the repository. I really wish I had a better answer on this.
Metadata Update from @smooge:
- Issue close_status updated to: Will Not/Can Not fix
- Issue status updated to: Closed (was: Open)
to comment on this ticket.