Global statistics on translation levels of fedora products
dnf install podman
Each release need is own image.
podman build . -f docker/Dockerfile.$release -t fedlocstats:$release
podman build . -f docker/Dockerfile.31 -t fedlocstats:31 podman build . -f docker/Dockerfile.32 -t fedlocstats:32 podman build . -f docker/Dockerfile.33 -t fedlocstats:33
podman run -it --rm -v ./:/src:z -v ./srpms:/srpms:z --tmpfs /tmp:size=4G fedlocstats:$release $script
with $script
, one of the following:
./build.py
get srpm lists, apply discover and compute progression stats
./build_language_list.py
For each package, produce progression stats.
./build_packages_stats.py
For each package, produce progression stats.
./build_global_stats.py
Applies data cleanups and enhancements (cldr name).
./build_map.py
Agregate the data per language, then apply it on territories (it uses stats from CLDR with language per territory).
./build_tm.py
Detect the list of languages Aggregate all files for a language and produce a compendium, a terminology and a translation memory.
0.error.language not in cldr.csv
contains unknown languages (lines are removed)0.error.languages is numeric.csv
contains numeric languages (lines are removed)0.error.lang with point.csv
contains languages such as ".cp936" ".big5" (lines are removed)0.error.len(language).csv
contains languages with more than three caracters (lines are removed)0.error.len(territory).csv
contains territory with more than two caracters (lines are removed)0.error.no population for this language-territory couple.csv
contains the list of language-territory couple where no language statistics exists (no impact on results)1.debug.lang.csv
all lang (language + script + territory) values for debug (no impact on results)1.debug.language.csv
all lang values for debug (no impact on results)1.debug.script.csv
all script values for debug (no impact on results)1.debug.territory.csv
all territory values for debug (no impact on results)1.debug.total message = 0.csv
all lang values for debug (lines are removed)3.result.csv
full results per package with source filename and standardized language code, script code and territory code4.0.cldr.csv
language per territory as provided by CLDR4.1.results_per_language.csv
message and words progress percentages per language4.1.results_per_language_ISO3.csv
message and words progress percentages per language merged with "country code" database using ISO3166-1-Alpha-2 code4.2.cldr_and_results_full.csv
language per territory as provided by CLDR merged with message and words progress percentages per language4.3.cldr_and_results_grouped.csv
aggregation per territory of 4.2.cldr_and_results_full.csv
, provides the territory, the number of languages, the population, the messages and words coverage.4.4.world_stats.csv
merge results of 4.3.cldr_and_results_grouped.csv
with country database and geojson data.Data in CLDR-raw folder comes from https://github.com/unicode-org/cldr/blob/master/common/main/en.xml
automatic calculation (group by territory + spoken percentage * spoken )
create stats: number of countries with official language > 50% and related population
create stats: number of languages impacting more than one official language
AppData and Zanata statistics: https://github.com/Jibec/fedora-translation-statistics Transtats: https://transtats.fedoraproject.org/releases/