README.md

fedora-localization-statistics

Global statistics on translation levels of fedora products

Requirements

dnf install translate-toolkit podman

Create needed folders

mkdir -p ./src.rpms/f30/ ./results/f30/
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Run the scripts

Get package list

This step is for now manual, I took list of DNF packages from Koji: https://koji.fedoraproject.org/koji/buildinfo?buildID=1252912

Get the rpm sources

./download-f30-srpm-in-container.sh

Downloading the file is done inside a container so we can produce stats even if using Fedora 29. This represents about 7 GB for Fedora 30 and takes some time.

Compute data

./build.py

The result will be in multiple files inside the results folder.

Produce stats

./build_stats.py

Applies data cleanups and enhancements (cldr name).

Informations

Data in CLDR-raw folder comes from https://github.com/unicode-org/cldr/blob/master/common/main/en.xml

Ideas

  1. CLDR supplementalData.xml: https://github.com/unicode-org/cldr/blob/master/common/supplemental/supplementalData.xml
    1. use territoryContainment to build geographic groups
    2. use languageData to detect default script
    3. use languageData to have basic stats about territories
    4. use territoryInfo to have advanced stats about territories
  2. CLDR supplementalMetadata.xml: https://github.com/unicode-org/cldr/blob/master/common/supplemental/supplementalMetadata.xml
    1. use the replacement values harmonize content
  3. CLDR likelySubtags.xml: https://github.com/unicode-org/cldr/blob/master/common/supplemental/likelySubtags.xml
    1. use the replacement advanced harmonization?
  4. CLDR languageInfo.xml: https://github.com/unicode-org/cldr/blob/master/common/supplemental/languageInfo.xml
    1. can we say if language is >= 90% close to another one, we can consider we propagate translation statistics?
  5. CLDR languageGroup.xml: https://github.com/unicode-org/cldr/blob/master/common/supplemental/languageGroup.xml
    1. what is it?

See also

AppData and Zanata statistics: https://github.com/Jibec/fedora-translation-statistics Transtats: https://transtats.fedoraproject.org/releases/