#7677 data collection for new DNF countme proposal
Opened 2 months ago by mattdm. Modified 21 days ago

Describe what you need us to do:

(I sent this as an email to Leigh and Jim but am now also filing a ticket so it doesn't
get lost.)

We have a new tool in DNF for getting more accurate Fedora usage counting. See https://fedoraproject.org/wiki/Changes/DNF_UUID for details.

The DNF team implemented a test version of the client side. However, the server
side is not in place. This means that 1) we aren't actually getting any benefit from
all of this work and 2) we can't even see if it needs testing or adjusting.

With the existing mirror statistics, the data is presented to me as a big multi-column CSV file. The columns in that file are somewhat add-hoc and have grown over time, and they're a bit confusing in what each column aggregates.

I'd like to take the opportunity of the new data format to get this to be both more consistent and more useful. But doing that requires some design work, and then scripting implementation.


1) What data are we looking to represent.

Summarized data for the dnf count. Right now, we get
IP-address-per-day counts in columns broken out by Fedora or EPEL
release, architecture, and some other random stuff.

2) How do you want the data represented.

There are two formats which would be useful. I can make do with either one.

One would be a database (sqlite, CSV, or remote database access; I don't care) with one entry per unique system request per timeperiod (day or week, whatever we decide), where each entry would have values for version, variant, arch, and count. Then, I can expand myself. These files would l kely be rather large (but compress well), as there are millions of unique entries per timeperiod.

Two would be aggregate per-time-period files for at least version-variant (so, f30-server, f30-workstation, f30-cloud, ..., f31-server, f31-workstation, f31-cloud`...) with per-time-period summaries. This is less versatile but the files would be smaller.

Option one is really my preference.

3) Should this be automatically generated, or are you expecting ad-hoc runs?

Definitely automatically. This would replace the current stats-gathering script, although I would like that (see issue #76) to keep running for the foreseeable future

When do you need this? (YYYY/MM/DD)

As this is an F30 feature and we may need time during the beta to adjust the client side aspect, at beta release is ideal — so, 2019/04/02.

When is this no longer needed or useful? (YYYY/MM/DD)

When Fedora gets completely defunded because we can't plan or make strategic decisions because we lack any insight or metrics?

If we cannot complete your request, what is the impact?

  • Visibility into Fedora usage remains poor
  • Work of DNF team to make things better for Fedora wasted
  • Must continue to guess as to patterns of usage of different Fedora editions and spins, possibly generating huge amounts of misplaced effort

Metadata Update from @smooge:
- Issue assigned to smooge

2 months ago

There do not seem to be any hosts checking in with count since the beta was released. I am wondering if there was something that wasn't enabled or added to the boot?

OK I tested with the F30 beta and the client code was not shipped in it. There are really very few changes that need to be done on the server side... the data the client would send is already going to be captured and could be processed later.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Reporter (was: Needs Review)

21 days ago

OK this has been delayed to F31 as dnf needed to put in things which did not occur in F30. I have also created a sysadmin group sysadmin-analysis with mattdm and me in it. I need to put in the rules for him to be able to run rbac jobs on data-analysis and to sudo but I think he could log in and test his scripts against the combined logs and such to see what is needed.

Login to comment on this ticket.