Issue #16: bootstrap ideas? - fedora-contributor-trends

fedora-contributor-trends

#16 bootstrap ideas?

Opened 3 years ago by mattdm. Modified 3 years ago

The pickle files used for caching the datagrepper queries add up to about 2.6gb uncompressed, or 180MB xz compressed. It'd be nice to distribute this to people so they don't have to go through the long process of recreating locally to get started. I'm open to ideas.

Some problems:

The pickle format isn't very robust and these will need to be recreated if we change something.
It's also not secure -- loading the pickle opens up the possibility of arbitrary code execution
It's still a lot of data -- I don't think we want to check it in directly to this repo.

Some ideas:

a separate repo that could be added as a submodule and which we'll update periodically?
a tarball put on fedorapeople.org I make available to download?
some kind of fancier cloud service?

What do you think?

aawizard commented 3 years ago

I guess Git LFS (Large File Storage) could be used in this case

mattdm commented 3 years ago

Possibly! Although as it's set up right now, it saves one cache file per query type per week, so there are thousands of small-ish (a few megabytes each) files rather than one big one.

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

fedora-contributor-trends

Source Code

#16 bootstrap ideas? Opened 3 years ago by mattdm. Modified 3 years ago

Metadata

#16 bootstrap ideas?

Opened 3 years ago by mattdm. Modified 3 years ago