build_tm.py is throwing errors on a few languages, resulting 404 on related tmx/compendium links on the website.
List of impacted languages:
2021-02-10 21:25:27,896 - buildTm.check_lang - WARNING - bg-compendium is missing 2021-02-10 21:25:27,896 - buildTm.check_lang - WARNING - bg-tmx is missing 2021-02-10 21:25:27,897 - buildTm.check_lang - WARNING - bg-terminology is missing 2021-02-10 21:25:27,910 - buildTm.check_lang - WARNING - cs-tmx is missing 2021-02-10 21:25:27,910 - buildTm.check_lang - WARNING - cs-terminology is missing 2021-02-10 21:25:27,913 - buildTm.check_lang - WARNING - de-compendium is missing 2021-02-10 21:25:27,913 - buildTm.check_lang - WARNING - de-tmx is missing 2021-02-10 21:25:27,913 - buildTm.check_lang - WARNING - de-terminology is missing 2021-02-10 21:25:27,923 - buildTm.check_lang - WARNING - es-tmx is missing 2021-02-10 21:25:27,933 - buildTm.check_lang - WARNING - eu-tmx is missing 2021-02-10 21:25:27,933 - buildTm.check_lang - WARNING - eu-terminology is missing 2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING - fr-compendium is missing 2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING - fr-tmx is missing 2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING - fr-terminology is missing 2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING - fr_BE-tmx is missing 2021-02-10 21:25:27,943 - buildTm.check_lang - WARNING - gl-compendium is missing 2021-02-10 21:25:27,943 - buildTm.check_lang - WARNING - gl-tmx is missing 2021-02-10 21:25:27,943 - buildTm.check_lang - WARNING - gl-terminology is missing 2021-02-10 21:25:27,992 - buildTm.check_lang - WARNING - nb_NO-compendium is missing 2021-02-10 21:25:27,992 - buildTm.check_lang - WARNING - nb_NO-tmx is missing 2021-02-10 21:25:27,992 - buildTm.check_lang - WARNING - nb_NO-terminology is missing 2021-02-10 21:25:28,003 - buildTm.check_lang - WARNING - pl-compendium is missing 2021-02-10 21:25:28,004 - buildTm.check_lang - WARNING - pl-tmx is missing 2021-02-10 21:25:28,004 - buildTm.check_lang - WARNING - pl-terminology is missing 2021-02-10 21:25:28,005 - buildTm.check_lang - WARNING - pt-tmx is missing 2021-02-10 21:25:28,005 - buildTm.check_lang - WARNING - pt-terminology is missing 2021-02-10 21:25:28,005 - buildTm.check_lang - WARNING - pt_BR-tmx is missing 2021-02-10 21:25:28,006 - buildTm.check_lang - WARNING - pt_BR-terminology is missing 2021-02-10 21:25:28,010 - buildTm.check_lang - WARNING - ru-compendium is missing 2021-02-10 21:25:28,010 - buildTm.check_lang - WARNING - ru-tmx is missing 2021-02-10 21:25:28,010 - buildTm.check_lang - WARNING - ru-terminology is missing 2021-02-10 21:25:28,017 - buildTm.check_lang - WARNING - sk-compendium is missing 2021-02-10 21:25:28,017 - buildTm.check_lang - WARNING - sk-tmx is missing 2021-02-10 21:25:28,017 - buildTm.check_lang - WARNING - sk-terminology is missing 2021-02-10 21:25:28,025 - buildTm.check_lang - WARNING - sv-tmx is missing 2021-02-10 21:25:28,025 - buildTm.check_lang - WARNING - sv-terminology is missing 2021-02-10 21:25:28,057 - buildTm.check_lang - WARNING - zh_Hant-compendium is missing 2021-02-10 21:25:28,058 - buildTm.check_lang - WARNING - zh_Hant-tmx is missing 2021-02-10 21:25:28,058 - buildTm.check_lang - WARNING - zh_Hant-terminology is missing
i'm running the following commands :
./build_language_list.py --results f33 --refresh && ./build_language_list.py --results f33 --analyzealllang && ./build_tm.py --results f33 --compress && ./build_stats.py --results f33
full execution log : https://paste.centos.org/view/2d9685a3
thanks, it should be fixed by: https://pagure.io/fedora-localization-statistics/c/effb206493e30bd85c29aea9d86cab7bf7dcc63f?branch=main
the way to find the faulty file is really naive and slow :/
Metadata Update from @jibecfed: - Issue status updated to: Closed (was: Open)
oh, it's not finished!
2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING - es-tmx is missing 2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING - eu-tmx is missing 2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING - eu-terminology is missing 2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING - fr_BE-tmx is missing 2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING - gl-tmx is missing 2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING - gl-terminology is missing 2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING - pt-tmx is missing 2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING - pt_BR-tmx is missing 2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING - pt_BR-terminology is missing 2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING - sv-tmx is missing 2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING - sv-terminology is missing
es 2021-02-14 23:40:53,108 - buildTm - ERROR - TMX generation triggered an ValueError exception: Syntax error on line 18499
eu 2021-02-14 23:48:32,784 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xf1 in position 316838: invalid continuation byte 2021-02-14 23:48:33,221 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xf1 in position 11: invalid continuation byte
fr_BE 2021-02-15 00:27:14,136 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xe9 in position 261: invalid continuation byte
gl 2021-02-15 00:32:50,160 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xfa in position 141702: invalid start byte 2021-02-15 00:32:50,512 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xfa in position 5: invalid start byte
pt 2021-02-15 12:35:24,136 - buildTm - ERROR - TMX generation triggered an ValueError exception: Syntax error on line 1564
pt_BR 2021-02-15 12:40:26,059 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xe9 in position 179884: invalid continuation byte 2021-02-15 12:40:26,727 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xe9 in position 6: invalid continuation byte
sv 2021-02-15 13:42:37,548 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xd6 in position 510812: invalid continuation byte 2021-02-15 13:42:38,066 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xd6 in position 24: invalid continuation byte
Metadata Update from @jibecfed: - Issue status updated to: Open (was: Closed)
build_tm also took about 20 hours to process, which is not acceptable :/
https://pagure.io/fedora-localization-statistics/c/702c716cf4e50846d80e3f6a5d69e5b9ff405fbd?branch=main solves the processing time issue.
Other issues related to tmx will be investigated next.
fixed with: https://pagure.io/fedora-localization-statistics/c/d07083a63b86263dc0225e95d82f144358e725ae?branch=main
Login to comment on this ticket.