README.md

Get user activities statistics

There are 3 components:

Get get the data

  • queryfedbus.py
  • queryfedbus2.py

They get the same data: a list of messages sent to the specified mailing list, in this format: sender email, mail subject, timestamp, datetime, datagrepper uuid

  • The first one get the messages in the specified range of dates. It takes a lot of time, because it queries for all the messages in any mailing list. For each message it parses if it was addressed to the mailing list we want to get statistics.
  • Te latter is much faster, because it queries datagrepper using the "contains" parameter. However the data range is limited to 224 days in the past.

To parse the data

  • parsecsv.py

It parses the resulting CSV from one of the previous script.

For each line: * it checks if the sender is not in exclude option (i.e. automated messages from some bot like updates@fedoraproject.org) * it get the mail address of the sender * it queries FAS using fas.people_query and the mail as the constraint in order to get the FAS username * if the query fails (the mail used to send the message to the mailing list is not the same used in FAS) * it strip the @domain part of the mail and as a best effort it assumes the username is this part of the mail * using fas.person_by_username it gets these information: * last_seen * group_roles (excluding the ones specified in excludegroups, like cla_done, cla_fpca) * then it parse again the CSV file containing the mailing list stuff in order to look for subsequent messages after the introduction one * it queries datagrepper in order to grep some activities in the time period following the introduction e-mail; for QA: * total number of Bodhi activities ('user': user, 'category': 'bodhi') * total number of Wiki edits ('user': user, 'category': 'wiki') * total numebr of Bugzilla activities ('user': user, 'category': 'bugzilla') * total kernel tests activities ('user': user, 'category': 'kerneltest') * total number of mails to any other Fedora mailing list (excluding the one in question) * the result is a CSV file containing: * username: FAS username * first_intro: date of the introductory mail sent to the mailing list * last_seen: date of the last login to FAS * privacy: privacy setting status on FAS * followups: number of mails sent to the mailing list following the introductory one * count_additional_groups: total number of FAS groups the user is part of (in addition to the ones specified in the excludegroups config option) * qa_group_status: status of the sponsoring * bodhi_activity: number of activities in Bodhi * kerneltest_activity: number of performed kernel tests * bugzilla_count: number of bugzilla activities * wiki_activity: number of wiki activities * mailman_count: number of mails sent to any mailing list

Usage

python3 ./queryfedbus.py <START_DATE> <END_DATE> <mailing_list_name>
python3 ./queryfedbus2.py <START_DATE> <END_DATE> <mailing_list_name>

START_DATE and END_DATE should have this format YYYY-MM_DD mailing_list_name shoud be the part before @lists.fedoraproject.org

python3 parsecsv.py

It will parse the CSV defined as filetoparse in the configuration file.