mdhist

Created 6 years ago
Maintained by mdomonko
Metadata history archive
Members 1
Michal Domonkos committed 6 years ago

Fedora metadata history archive

Tracks fedora-updates repodata in plain text (XML) format downloaded and committed roughly every 1-2 days, along with some scripts to easily inspect it.

View latest plot image

The main purpose of this repo is to help me see how much data is actually changed when metadata is updated on the mirrors. Keeping regular snapshots in git makes it easy to view textual diffs as well as ensures that the archive is kept as small as possible (thanks to git's packing abilities).

My goal is to use this information to (or let others) design the optimal delta format for Fedora repodata that we could implement in DNF and Fedora infra. Right now, DNF downloads the whole metadata every time which is inefficient.

Several people have proposed their ideas by now, the latest (and IMO most detailed) being this one by Jonathan Dieter.

The second purpose of this repo is to simulate a mirror that can replay the metadata snapshots to be used for, for example, testing a proof-of-concept implementation of the delta protocol (see below for the instructions).

Installation

$ git config alias.md '!PYTHONPATH=$(git rev-parse --show-toplevel) bin/md'

Usage

$ git md -h
usage: md [-h] [--cachedir CACHEDIR] {pull,plot,log} ...

positional arguments:
  {pull,plot,log}
    pull               fetch and commit latest metadata
    plot               generate a visual PNG plot
    log                show a condensed git-like log

optional arguments:
  -h, --help           show this help message and exit
  --cachedir CACHEDIR  directory to use for caching expensive operations
                       (default: /var/tmp/mdhist)

You can also generate a DNF repository that replays the repodata content as it appeared in this git repo, starting from the very first snapshot:

import pygit2
from mdhist.api import DnfRepo
repo = DnfRepo(pygit2.Repository('.'), '/var/tmp/myrepo', '/var/tmp/mdhist')

This creates and populates a local DNF repo in the specified directory (must not exist):

$ tree /var/tmp/myrepo/
/var/tmp/myrepo/       
└── repodata           
    ├── 2bcd85f8762452ca46f48e2d46874cb115a196cfe7e68ab34aa454218908676c-group_gz.xml.gz       
    ├── 7dd6bc7e1c842a98c5331399e055e33a06a00e1fdaecb93b61e16834211014df-other.xml.gz          
    ├── a8c47c74337fc5c2262903eb78c5931d05ecc4205c5d833a1b1d76aadf3fb0f7-updateinfo.xml.gz     
    ├── ae5ab188ead6a79649647f879b6d32964b775b235b6b170a08cd991f60361928-filelists.xml.gz      
    ├── ee2b3ee93c37044ecedb368e8a8bcc0bd2b382c426b5c0472d1279c945888c1d-primary.xml.gz        
    └── repomd.xml     

1 directory, 6 files

To move to the next snapshot, just call:

repo.update()