Tracks fedora-updates repodata in plain text (XML) format downloaded and committed roughly every 1-2 days, along with some scripts to easily inspect it.
The main purpose of this repo is to help me see how much data is actually changed when metadata is updated on the mirrors. Keeping regular snapshots in git makes it easy to view textual diffs as well as ensures that the archive is kept as small as possible (thanks to git's packing abilities).
My goal is to use this information to (or let others) design the optimal delta format for Fedora repodata that we could implement in DNF and Fedora infra. Right now, DNF downloads the whole metadata every time which is inefficient.
Several people have proposed their ideas by now, the latest (and IMO most detailed) being this one by Jonathan Dieter.
The second purpose of this repo is to simulate a mirror that can replay the metadata snapshots to be used for, for example, testing a proof-of-concept implementation of the delta protocol (see below for the instructions).
$ git config alias.md '!PYTHONPATH=$(git rev-parse --show-toplevel) bin/md'
$ git md -h usage: md [-h] [--cachedir CACHEDIR] {pull,plot,log} ... positional arguments: {pull,plot,log} pull fetch and commit latest metadata plot generate a visual PNG plot log show a condensed git-like log optional arguments: -h, --help show this help message and exit --cachedir CACHEDIR directory to use for caching expensive operations (default: /var/tmp/mdhist)
You can also generate a DNF repository that replays the repodata content as it appeared in this git repo, starting from the very first snapshot:
import pygit2 from mdhist.api import DnfRepo repo = DnfRepo(pygit2.Repository('.'), '/var/tmp/myrepo', '/var/tmp/mdhist')
This creates and populates a local DNF repo in the specified directory (must not exist):
$ tree /var/tmp/myrepo/ /var/tmp/myrepo/ └── repodata ├── 2bcd85f8762452ca46f48e2d46874cb115a196cfe7e68ab34aa454218908676c-group_gz.xml.gz ├── 7dd6bc7e1c842a98c5331399e055e33a06a00e1fdaecb93b61e16834211014df-other.xml.gz ├── a8c47c74337fc5c2262903eb78c5931d05ecc4205c5d833a1b1d76aadf3fb0f7-updateinfo.xml.gz ├── ae5ab188ead6a79649647f879b6d32964b775b235b6b170a08cd991f60361928-filelists.xml.gz ├── ee2b3ee93c37044ecedb368e8a8bcc0bd2b382c426b5c0472d1279c945888c1d-primary.xml.gz └── repomd.xml 1 directory, 6 files
To move to the next snapshot, just call:
repo.update()