#5070 Request: run MetricsGrimoire mlstats against all mailing lists monthly; provide access to database
Closed: Upstream a year ago Opened 3 years ago by mattdm.

I want to have ongoing metrics of several things across Fedora's mailing lists.
See https://lists.fedoraproject.org/archives/list/commops%40lists.fedoraproject.org/message/W2OMDI5MO3BN7SFHPIJ3DZD6VN5R63YU/ for details.

I learned of a nifty tool to assist with this http://metricsgrimoire.github.io/MailingListStats/. (It's Python but not yet packaged in Fedora.) This tool takes mailing list archives in mbox format and puts them into a database with tables for running reports against.

It'd be awesome if we could do that import automatically (monthly would be adequate, more would not hurt) and provide the result for use — either as a postgres database or as downloadable sqlite files.


I am not fully sure we have mbox files with mailman3 anymore. ;(

Moving this to mailing lists component so abompard sees it.

What might be very nice is if we could just improve the hyperkitty stats page for lists to just include all the information we want, then we would not need another tool at all. Not sure how difficult that would be however.

Replying to [comment:1 kevin]:

I am not fully sure we have mbox files with mailman3 anymore. ;(

On the archives page, there is a "Download" button which gives an mbox archive. Possibly generated on the fly?

Moving this to mailing lists component so abompard sees it.

nod -- although if done outside of hyperkitty it could perhaps be something even an infrastructure apprentice could do.

What might be very nice is if we could just improve the hyperkitty stats page for lists to just include all the information we want, then we would not need another tool at all. Not sure how difficult that would be however.

See my message linked above — I want to look at the information from a lot of different angles. And I'd like it as raw data for further analysis. That seems like a tall order for an integrated feature. I mean, if someone wants to do it that way, I won't complain. :)

I suppose we could also make queries against the hyperkitty database directly, but the mlstats db is nice because there are existing tools for querying it.

Sorry for not seing this earlier.

Actually I also store the incoming email. The reason is that !HyperKitty does some filtering before storing in the database, and I think it's more cautious to keep the original data somewhere.

There's a Maildir directory for each mailing-list, and they are on mailman01 in the /var/lib/mailman3/archives/prototype/ top directory.

There you'll only have the mails processed by Mailman3 (not the imported data) but you can use mailman2's existing mbox files to get the historic data.

Let me know if I can help.

If you want to know how many people use !HyperKitty (as opposed to just the mailing-lists), you'll need read access to the HK database. I'm now quite sure who should create that DB user though (I could).

@abompard So, looking at this again, if I try and get:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/export/devel@lists.fedoraproject.org.mbox.gz

I get a 500 error after some time. Can we look at fixing that. Then mlstats should in theory work ok here, and we would need to:

  • package it up
  • run it in infra
  • give @mattdm access to make queries against that db.

@skamath does the hyperkitty plugin for Grimoire have something to offer here now that it didn't 2 years ago?

Metadata Update from @smooge:
- Issue assigned to smooge

a year ago

I think commops is working on something with this? Not sure of status.

CommOps is working on this now. I suggest closing this ticket and following their work.

Here is a starting point ticket: https://pagure.io/fedora-commops/issue/134

The basic goal is to get mail lists up on Grimoire to expose data that can be used to research the metrics questions. There are some other unrelated tasks but this is a phase 1 desire.

Great. Let us know if we need to do anything on our side.

:mailbox:

Metadata Update from @kevin:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata