#5 [discuss] What should be the architecture of the tooling
Opened 2 years ago by bookwar. Modified 2 years ago

Should we get a tooling so that one can create a report from their own machine on demand? Via scripts or maybe Jupyter notebooks?

Should we setup a service which will collect metrics and represent them via some dashboards, like for example https://www.stackalytics.com/?metric=filed-bugs&release=victoria

Note that Stackalytics is a large established service with years of development. We can start with something much smaller and simpler.


Something like that would be amazing. I agree that we'd need to start much smaller and simpler, though!

Hello @mattdm! I have been researching about the libraries to get more involved in the ecosystem in Fedora Infra. I found fedmsg will be replaced by fedora-messaging. Does the dataggreper consume only messages from fedmsg or it have been fully-migrated to work on fedora-messaging?

I found these discussion threads on
fedora-commops #114 started by @jflory7, and #105. What do you think about get started with the proposed architecture and get started with these contributions metrics?

@josseline It's my understanding that everything goes into datagrepper, both old and new. So I think nothing needs to be changed, but honestly I'm not 100% sure which is why it's on the list of things needing investigation. :)

As for the commops discussion and graphana... I'm not opposed, but I'm also not sure how easy it is to do the kind of analysis my current thing does (particularly around bucketing into contribution levels and grouping by years-in-the-project).

I've been researching more about the Grimoire Lab stack, in their docs there are different scenarios and I think we would work in the internship specifically with the Producing enriched indexes.

Storage the data from the datasources and produce these indexes using GrimoireELK (it works over ElasticSearch) would fix the current issue of data gathering.

I found this example to work over these indexes and producing a DataFrame with the newcomers by year.

Maybe we can focus on produce these indexes, make queries, and process them with Pandas and add a first version of the charts just like the example. Later on, it would be improved by adding Kibitter integration. What do you think about it? Is it a feasible scope or is it too much? Please let me know @mattdm :blush:

I am leaning into @josseline's thinking here!

The architecture is the biggest question for this tool. I see a powerful story for Fedora to join forces with the likes of the CHAOSS Project and the GrimoireLab stack. They have created tools that solve the kinds of problems we want to solve, and they have 16 years of Open Source experience informing the design decisions behind what they do. There is an opportunity for more than just a tooling partnership with GrimoireLab, but also to connect into a wider community of Open Source data enthusiasts and people who work with these kinds of problems everyday.

In the Summer Coding chat where this came up, I mentioned that Cauldron.io is an amazing platform that offers pre-configured GrimoireLab reports on the fly, you just provide a git repository. Currently there is support for GitHub, GitLab, GNOME GitLab, and KDE GitLab, but nothing for dist-git or Pagure. However, since Cauldron is Open Source, there is an opportunity to explore what adding fedora-messaging or Pagure support to a tool like Cauldron looks like.

So, I am just an outside observer here and not the one doing the heavy lifting on this work… but I think we have some great stories to tell, and using the GrimoireLab stack as a base enables us to spend more time on how we turn data into stories, instead of building our own custom tooling to solve a similar problems as GrimoireLab.

@jflory7 Do you have some examples of GrimoireLabs reports showing something similar to what my graphs show? They seem well-suited to pretty display of simple "count" metrics like "how many commits per day", but I don't see anything that shows the breakdown of new, intermediate, and old-school users, or separation by users with high engagement (activity every week) vs. long-term infrequent contributors vs. short-term drive-bys.

The closest I'm finding the Cauldron.io reports is "onboardings", but either I'm not understanding that or else it's just not presenting the kinds of things I'm interested in knowing.

@mattdm I think this project, which was built over GrimoireLabs, shows metrics similar to the community structure.

Maybe the first approach to this could be to start with a custom dashboard because it allows to get data from Pagure and make an integration with fedora-messaging or any custom datasource using with their Perceval module.

@mattdm I think this project, which was built over GrimoireLabs, shows metrics similar to the community structure.

Ooh, yeah, that does basically same thing as this from my report:

Screenshot_from_2021-04-22_16-58-46.png

although broken at 80/15/5 of actions rather than 1/9/40/50 percentile cohorts as mine does. Either way can work, although I kind of like my visualization better for understanding at a glance. For future reference here's a screenshot from there:

Screenshot_2021-04-22_Overall_Community_Structure_-_Development_Bitergia_Analytics.png

In that project for Q1 2021, 21 people did 95% of the work and the remaining 30 did the last 5%, which is really similar to what we see in Fedora, where the dark blue at the top is 50% of people doing also about 5% of the work.

For using pagure as a datasource, it's worth noting that we have two different pagure instances,

First is https://src.fedoraproject.org, which is "dist-git" -- that's where changes to package build scripts for packages that go in Fedora Linux and EPEL live.

We also have pagure.io -- this site, of course -- which has a lot of Fedora-related repositories used for a zillion different things in a zillion different ways. A Design Team ticket is different from a Fedora Council ticket, which is different from an SCM request ticket, and all of that is different from, say, the 389 directory server project which happens to use pagure.io as a gitforge instead of github. So that gets messy really fast!

Hi, @mattdm I sent you a dm by IRC, could you please check it out? it is for my outreachy proposal :blush:

Login to comment on this ticket.

Metadata