#16 bootstrap ideas?
Opened 3 years ago by mattdm. Modified 3 years ago

The pickle files used for caching the datagrepper queries add up to about 2.6gb uncompressed, or 180MB xz compressed. It'd be nice to distribute this to people so they don't have to go through the long process of recreating locally to get started. I'm open to ideas.

Some problems:

  1. The pickle format isn't very robust and these will need to be recreated if we change something.
  2. It's also not secure -- loading the pickle opens up the possibility of arbitrary code execution
  3. It's still a lot of data -- I don't think we want to check it in directly to this repo.

Some ideas:

  • a separate repo that could be added as a submodule and which we'll update periodically?
  • a tarball put on fedorapeople.org I make available to download?
  • some kind of fancier cloud service?

What do you think?


I guess Git LFS (Large File Storage) could be used in this case

Possibly! Although as it's set up right now, it saves one cache file per query type per week, so there are thousands of small-ish (a few megabytes each) files rather than one big one.

Login to comment on this ticket.

Metadata