#26 Mailing lists analysis and graphs
Closed: Moved a year ago Opened 3 years ago by jflory7.

This ticket is sourced from a specific mailing list thread started by @mattdm as a request for some data he would like to see organized.

What to look at

Matt noted that there are specific mailing lists that would be good to cover cumulatively, to best represent the "face" of Fedora. The lists mentioned are:

  • fedora-council-discuss
  • devel
  • test
  • server
  • cloud
  • workstation
  • marketing
  • design
  • commops
  • docs
  • translation
  • Others…?

Also noted it's best to avoid lists for specific projects in Fedora, e.g. Anaconda mailing list. Some SIGs relevant to Fedora may be good to include (Fedora Join, Fedora Women?), but it might be good to separate data from these lists from the grouping mentioned above.

It would be good to filter out bots and automated posts to lists. Depending on the list, ticket traffic might be a good filter as well, but for some groups (e.g. FAmSCo), most of the discussion does happen in the tickets, so these might be valuable emails to include in some lists.

Users by month

Noted that a user would be considered an email address. Would be cool to group emails from same user with different addresses (is this possible in Hyperkitty?).

Matt identified three points about finding metrics for users by month. The full list of details can be found in his thread.

Threads

Matt identified four points for finding metrics about mailing list threads. The full details about threads can be found in his thread on the CommOps mailing list.

This is something we'd like to keep on our agenda and to-do list as we move forward. At the time of filing, it does not appear that this is a high urgency ticket, so it would be a good goal to have completed by the end of the F23 cycle. I have followed up with Matt in the interim to clarify if there is an urgency behind collecting these metrics.


Quoted from the mailing list thread, for reference purposes:

It's not urgent, but if someone is felling very ambitious, it'd be
amazing to be able to talk about this at DevConf/FOSDEM at the end of
this month. Or if they're feeling even more ambitious, Remy and I
have an internal presentation even before that which this would be
really useful for. Otherwise, the next time that is a big milestone is
the next Flock.

Remy pointed me towards https://github.com/MetricsGrimoire/MailingListStats, which turns out to be awesome.

It works out-of-the-box with the old Mailman 2 "pipermail" URLs, like https://lists.fedoraproject.org/pipermail/devel/.

Hyperkitty doesn't have pages like that, but it wouldn't be a big enhancement to the program to support them, I don't think, and in the meantime one can download the mbox files by hand.

That gives you a database (I used sqlite) with tables with all sorts of useful views onto the data https://github.com/MetricsGrimoire/MailingListStats/wiki/Database-Schema, from which it shouldn't be hard to construct queries to spit out the graphs I'm looking for.

I should add: the most useful output for me isn't actually a graph. It's a csv file which I can play around with and make graphs from. I might hack on some of this stuff over the next few weeks. If anyone is interested in helping, let me know.

I put in an infrastructure request for this tool to be run automatically so we have a common database to work against. https://fedorahosted.org/fedora-infrastructure/ticket/5070

SELECT first_date, count(*) number_new FROM
  (SELECT email_address, strftime("%Y-%m-01",MIN(arrival_date)) AS first_date
   FROM messages_people, messages
   WHERE messages_people.message_id = messages.message_id
   AND messages.arrival_date > '2003-01-01 00:00:00'
   AND messages_people.type_of_recipient = 'From'
   GROUP BY email_address ORDER BY first_date)
GROUP BY first_date;

My SQL is done by googling for stuff I think will work, but here's a first cut at the first line (new users by month). Doesn't do any filtering for single-post only users or bots.

And ''this'' mess does all of 1a, 1b, and 1c in my message. With 90% confidence that it's correct. I'm mostly posting this to demonstrate that I could use some help from someone who actually knows SQL. :)

.mode csv
SELECT first_date,
       COUNT(*) number_new,
       SUM(CASE WHEN post_months>1 THEN 1 ELSE 0 END) as number_onceagain,
       SUM(CASE WHEN post_months>5 THEN 1 ELSE 0 END) as number_sixmonths
FROM
(
SELECT T1.email_address, first_date, post_months FROM 
( SELECT email_address, strftime("%Y-%m-01",MIN(arrival_date)) AS first_date
FROM messages_people, messages
WHERE messages_people.message_id = messages.message_id
AND messages.arrival_date > '2003-01-01 00:00:00'
AND messages_people.type_of_recipient = 'From'
GROUP BY email_address ) AS T1
JOIN
( SELECT email_address, count(*) post_months FROM 
( SELECT DISTINCT email_address, strftime("%Y-%m-01",arrival_date) as post_month
FROM messages_people, messages
WHERE messages_people.message_id = messages.message_id
AND messages.arrival_date > '2003-01-01 00:00:00'
AND messages_people.type_of_recipient = 'From'
ORDER BY arrival_date)
GROUP BY email_address ) AS T2 
ON T1.email_address = T2.email_address)
GROUP BY first_date;

Sample Graph #1 for Devel List
devel-list.png

Discussed in 2016-01-19 meeting.

We agreed that this ticket can be claimed by anyone willing to take this task on (would be a great project for an Outreachy intern). The sooner it's completed, the better, but it is not absolute emergency to be finished. We also agreed on an ABSOLUTE deadline of Flock 2016.

Discussed in 2016-01-26 meeting.

We would like to add this as a project for the 2016 Outreachy potential projects list. I've added the tag "intern" to this ticket to help group tickets together by this keyword to make it easy to identify tickets that will be great projects for a potential intern / Outreachy / GSoC / etc. participant to tackle later on!

@mattdm Did you proceed any further on this ? Is the csv file of email metadata still the deliverable ?

It seems like too small of a part for GSoC or Outreachy since most of the work seems done by @mattdm. If this is not yet done, I would like to work on it in the upcoming month.

Metadata Update from @bee2502:
- Issue untagged with: intern-ready
- Issue tagged with: help wanted

2 years ago

Metadata Update from @jflory7:
- Issue priority set to: no deadline (was: minor (3-4 weeks))

2 years ago

Discussed in CommOps 2018 FAD.

Ticket closed; migrated to #114

We focused on visualized metrics during the 2018 CommOps FAD, when discussing a GrimoireLabs dashboard for Fedora. We are closing this ticket because it has not had significant activity for a long time, and we prefer to consolidate it into #114.

Metadata Update from @jflory7:
- Issue untagged with: help wanted
- Issue close_status updated to: Moved
- Issue set to the milestone: Fedora 28 (to May 2018) (was: Future releases)
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata