#81 Add documentation about data exploration
Merged a year ago by t0xic0der. Opened a year ago by t0xic0der.
fedora-infra/ t0xic0der/arc dataeplt  into  main

file modified
+1
@@ -99,5 +99,6 @@ 

      creation_gram

      creation_fail

      solution_datanote

+     solution_dataeplt

      solution_examples

      solution_techtool

@@ -0,0 +1,120 @@ 

+ .. _solution_dataeplt.rst:

+ 

+ Data Exploration and Significance

+ ====

+ 

+ The following is a set of information that would be looked into by the said 

+ service whenever it would be deployed. Please note that this list consists

+ of both - the information that would be available for consumption by the

+ service users as well as the information that would be available for 

+ computation and analysis to the service itself but not the service users, and

+ there can be more such information apart from the ones listed below.

+ 

+ 1. Activity entry from Datanommer (For computation only)

+ 2. Username of the "subject" i.e. owner of the contribution (For computation only)

+ 3. Username of the "object" i.e. involved in the contribution (For computation only)

+ 4. Datetime data of a specific contribution activity (For computation only)

+ 5. Datetime data of a grouped contribution activity (For consumption only)

+ 6. Service where a specific contribution activity happened (For computation only)

+ 7. Service where a grouped contribution activity happened (For consumption only)

+ 8. Activity trends per username (For computation only)

+ 

+ 

+ Activity Entry from Datanommer

+ ----

+ 

+ This data forms the most basic functional entity of a "contribution record". An

+ occurrence of an activity means that a contribution was made by the "subject"

+ member on the "object" member and/or service with the "predicament" nature of

+ the contribution at the "time" of it happening. A computed collection of these 

+ data can help form wider statistics for example - trend of contribution by a 

+ certain "subject" member, trend of contribution on a certain "service" etc. 

+ allowing us to answer questions like "which services are most active (and why)

+ and least active (any why)?", "what period of time attracts most contributions

+ (and why)?" etc. As this data is intricate, it only serves its purpose when a

+ computed group of those form statistics and not when it is singled out - and

+ that is why this data is only used for computational purposes only.

+ 

+ Username of the "subject"

+ ----

+ 

+ Alternatively, owner of the contribution.

+ 

+ This data is a part of the previously-stated "activity entry from Datanommer" 

+ data. In order to protect the privacy of the members involved in the 

+ aforementioned data, this information is anonymized as a hash and due to the

+ fact that this data serves its purpose when a computed group of those form

+ statistics and not when it is singled out - this data is only used for 

+ computational purposes only.

+ 

+ Username of the "object"

+ ----

+ 

+ Alternatively, involved in the contribution.

+ 

+ This data is a part of the previously-stated "activity entry from Datanommer" 

+ data. In order to protect the privacy of the members involved in the 

+ aforementioned data, this information is anonymized as a hash and due to the

+ fact that this data serves its purpose when a computed group of those form

+ statistics and not when it is singled out - this data is only used for 

+ computational purposes only.

+ 

+ Datetime data of a specific contribution activity

+ ----

+ 

+ This data is a part of the previously-stated "activity entry from Datanommer"

+ data. Due to the fact that this data serves its purpose when a computed group

+ of those form statistics and not when it is singled out - this data is only 

+ used for computational purposes only.

+ 

+ Datetime data of a grouped contribution actvitity

+ ----

+ 

+ Being a derivative statistic obtained from a computed group of the previously

+ stated "datetime of a specific contribution activity", this can be used to 

+ understand the trend of contribution over a period of "time" for contributions

+ of a certain "nature", contributions over a period of "time" for contributions

+ on a certain "service" etc. This understanding would help us answer questions

+ like what timelines attract most contributions, what timelines do not have much

+ of contributions etc. and gauge the success of activities such as events and 

+ workshops by helping answer if those were able to bring in contributions right

+ after their commencement time. As a result, this data is available for user

+ consumption by the service.

+ 

+ Service where a specific contribution activity happened

+ ----

+ 

+ This data is a part of the previously-stated "activity entry from Datanommer"

+ data. As this data is intricate, it only serves its purpose when a computed

+ group of those form statistics and not when it is singled out - and that is why

+ this data is only used for computational purposes only. 

+ 

+ Service where a grouped contribution activity happened

+ ----

+ 

+ Being a derivative statistic obtained from a computed group of the previously

+ stated "service where a specific grouped contribution activity happened", this

+ can be used to understand the trend of contribution on a certain service and 

+ create comparisons of those against another to see how they fare in the 

+ contribution activities. This understanding would help us answer questions like

+ what services are most active in terms of contributons and what services are

+ not and gauge the usability of those services by knowing what makes those 

+ services desirable (i.e. inferred from favourable contribution statistics) and

+ undesirable (i.e. inferred from unfavourable contribution statistics) to direct

+ what service to be contributed to. As a result, this data is available for user

+ consumptions by the service.

+ 

+ Activity trends per username 

+ ----

+ 

+ Being a derivative statistic obtained from a computed group of the previously

+ stated "activity entry from Datanommer", this can be used to understand the

+ trend of contribution for a certain user. This understanding would help us 

+ answer questions like what fields a certain member contributes to and if they

+ are transitioning from one field to another, what reasons have led them to do

+ that. In order to protect the privacy of the members involved in the 

+ aforementioned data, this information is anonymized as a hash and due to the

+ fact that this data serves its purpose when a computed group of those form

+ statistics and not when it is singled out - this data is only used for 

+ computational purposes only.

+ 

Pull-Request has been merged by t0xic0der

a year ago