#2060 F30 System-Wide Change: DNF Better Counting
Closed: Accepted 2 months ago by churchyard. Opened 3 months ago by bcotton.

Right now, we estimate installed Fedora systems by counting unique IP addresses which show up in our updates mirror statistics. We need better data than that. There are some proposals for more complicated systems, but a quick thing we can do now to greatly improve what we have without a gigantic new infrastructure.

This is an update of a previous proposal to use a UUID to distinguish unique systems, as openSUSE does (see https://metrics.opensuse.org/). See also this previous Fedora Council discussion and this devel list thread.


Why does DNF still need to be involved it could just be a separate service AFAICS. Otherwise it would not work for ostree installations, would it?

Also as a design consideration, when using a counter with a cap the service could just become less precise, i.e. report < 1 week, > 1 week, > 2 weeks, > 4 weeks, > 8 weeks, > 16 weeks, > 24 weeks depending on the need how to aggregate the data later. On the other hand, the older the distribution is, it might also be that fresh installations become more unique.

Also it would be great to ensure that an individual system does not report back always at the same time.

Metadata Update from @churchyard:
- Issue tagged with: meeting

3 months ago

Why does DNF still need to be involved it could just be a separate service AFAICS. Otherwise it would not work for ostree installations, would it?

A ostree installation might be contact the yum repository:
* For rpm-ostree overlayed packages (rpm-ostree uses libdnf for this)
* For containers - in particular fedora-toolbox

so you'd see some traffic, but the numbers might not be very clean. It would be better,to add similar tracking to downloading the ostree summary file - we probably should approve adding counting os updates not to dnf in particular.

The arguments I see for adding better counting to OS updates rather than a separate service:
* Update bandwidth is a service that people are donating to Fedora - it's good if people providing this have an idea about what userbase they are servicing. (The ratio of downloads to countme= downloads should be uniform across mirrors.)
* Having two separate hits to the Fedora servers is going to leak (slightly) more information than a single hit.
* No matter how innocuous a separate counting service is, I think people are going to be allergic to it being opt-out.

If we're going to complicate things from countme=1, it would be nice to have a list of some specific questions we are trying to answer with it (one that comes to mind: how many users try out Fedora and abandon it quickly?) - not just "more pretty graphs".

I can see both advantages and disadvantages to doing it as part of DNF, but I agree with @otaylor that the balance comes out on the side of doing it in the package updates.

One more argument for this: with a separate service and a separate connection, there's more chance of the service being disabled accidentally (e.g. through some presets or other autoconfiguration), and there's more chance of the connection being blocked or filtered.

It would be better,to add similar tracking to downloading the ostree summary file - we probably should approve adding counting os updates not to dnf in particular.

Agreed.

it would be nice to have a list of some specific questions we are trying to answer with it (one that comes to mind: how many users try out Fedora and abandon it quickly?) - not just "more pretty graphs".

@mattdm?

I think I covered this at least in some way in the Change proposal, but I can put it in the form of questions.

  1. How many actual Fedora systems are deployed in the wild and how does that change over time? (Right now, this is masked by network topography.)
  2. How many of these systems are long-term installations as opposed to test/trial systems?
  3. What about very short lived instances used in CI, containers, and other automated infrastructure?
  4. Which Fedora variants are in use and how does that change over time?
  5. If we do something like feature a spin in Fedora Magazine or highlight it on the download page, what short-term impact does that have (and does that have a lasting effect)?

Using something other than a binary countme (see the proposal) would also allow us to answer:

  • Do people tend to upgrade or do new installs? When we see growth for a new release, is that people converting older systems or are they net new?
  • How quickly do these upgrades happen after a new release comes out?
  • How many systems are upgraded every release? How many are upgraded at N+2?
  • Is there a difference in these patterns between, say, Fedora Server and Fedora Workstation?

I can't answer any of these right now.

Here is the latest picture from the current method, which is counting daily occurrences of unique IP addresses per release. As you can see, Fedora 29 numbers are very high — like, already 40% higher than the F28 peak. Now, I would love to say "look at this awesome growth", but these numbers seem too good to be true. I would really like to have better insight into what's going on here.

fedora-os-all.png

It seems everybody likes the proposal, and we're only trying to figure out the details of form of the long-term counter.

Proposal: Approve the Change proposal. Details of countme= syntax will be figured out during the implementation with an eye towards gathering useful information while minimizing the probability of an individual system being identifiable.

+1 to the latest proposal

+1 to the latest proposal.

@zbyszek's proposal is APPROVED (+7, 1, 0)

https://meetbot.fedoraproject.org/fedora-meeting-1/2019-02-04/fesco.2019-02-04-15.01.html

Additionally:

@till will work with change owners to make sure the actual implementation is sane

Metadata Update from @churchyard:
- Issue untagged with: meeting
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

2 months ago

My personal note: i think this should get Fedora Council approval as well.

Login to comment on this ticket.

Metadata
Attachments 1
Attached 3 months ago View Comment