#23 translation memory generation not working in few cases
Closed 3 years ago by jibecfed. Opened 3 years ago by darknao.

build_tm.py is throwing errors on a few languages, resulting 404 on related tmx/compendium links on the website.

List of impacted languages:

2021-02-10 21:25:27,896 - buildTm.check_lang - WARNING -  bg-compendium is missing                                                                                                                
2021-02-10 21:25:27,896 - buildTm.check_lang - WARNING -  bg-tmx is missing                                                                                                                       
2021-02-10 21:25:27,897 - buildTm.check_lang - WARNING -  bg-terminology is missing                                                                                                               
2021-02-10 21:25:27,910 - buildTm.check_lang - WARNING -  cs-tmx is missing                                                                                                                       
2021-02-10 21:25:27,910 - buildTm.check_lang - WARNING -  cs-terminology is missing                                                                                                               
2021-02-10 21:25:27,913 - buildTm.check_lang - WARNING -  de-compendium is missing        
2021-02-10 21:25:27,913 - buildTm.check_lang - WARNING -  de-tmx is missing                                                                                                                       
2021-02-10 21:25:27,913 - buildTm.check_lang - WARNING -  de-terminology is missing                                                                                                               
2021-02-10 21:25:27,923 - buildTm.check_lang - WARNING -  es-tmx is missing                                                                                                                       
2021-02-10 21:25:27,933 - buildTm.check_lang - WARNING -  eu-tmx is missing                                                                                                                       
2021-02-10 21:25:27,933 - buildTm.check_lang - WARNING -  eu-terminology is missing         
2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING -  fr-compendium is missing               
2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING -  fr-tmx is missing                      
2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING -  fr-terminology is missing              
2021-02-10 21:25:27,937 - buildTm.check_lang - WARNING -  fr_BE-tmx is missing                                                                                                                    
2021-02-10 21:25:27,943 - buildTm.check_lang - WARNING -  gl-compendium is missing               
2021-02-10 21:25:27,943 - buildTm.check_lang - WARNING -  gl-tmx is missing                                                                                                                       
2021-02-10 21:25:27,943 - buildTm.check_lang - WARNING -  gl-terminology is missing              
2021-02-10 21:25:27,992 - buildTm.check_lang - WARNING -  nb_NO-compendium is missing                                                                                                             
2021-02-10 21:25:27,992 - buildTm.check_lang - WARNING -  nb_NO-tmx is missing                                                                                                                    
2021-02-10 21:25:27,992 - buildTm.check_lang - WARNING -  nb_NO-terminology is missing                                                                                                            
2021-02-10 21:25:28,003 - buildTm.check_lang - WARNING -  pl-compendium is missing
2021-02-10 21:25:28,004 - buildTm.check_lang - WARNING -  pl-tmx is missing
2021-02-10 21:25:28,004 - buildTm.check_lang - WARNING -  pl-terminology is missing
2021-02-10 21:25:28,005 - buildTm.check_lang - WARNING -  pt-tmx is missing
2021-02-10 21:25:28,005 - buildTm.check_lang - WARNING -  pt-terminology is missing
2021-02-10 21:25:28,005 - buildTm.check_lang - WARNING -  pt_BR-tmx is missing
2021-02-10 21:25:28,006 - buildTm.check_lang - WARNING -  pt_BR-terminology is missing
2021-02-10 21:25:28,010 - buildTm.check_lang - WARNING -  ru-compendium is missing
2021-02-10 21:25:28,010 - buildTm.check_lang - WARNING -  ru-tmx is missing
2021-02-10 21:25:28,010 - buildTm.check_lang - WARNING -  ru-terminology is missing
2021-02-10 21:25:28,017 - buildTm.check_lang - WARNING -  sk-compendium is missing
2021-02-10 21:25:28,017 - buildTm.check_lang - WARNING -  sk-tmx is missing
2021-02-10 21:25:28,017 - buildTm.check_lang - WARNING -  sk-terminology is missing
2021-02-10 21:25:28,025 - buildTm.check_lang - WARNING -  sv-tmx is missing
2021-02-10 21:25:28,025 - buildTm.check_lang - WARNING -  sv-terminology is missing
2021-02-10 21:25:28,057 - buildTm.check_lang - WARNING -  zh_Hant-compendium is missing
2021-02-10 21:25:28,058 - buildTm.check_lang - WARNING -  zh_Hant-tmx is missing
2021-02-10 21:25:28,058 - buildTm.check_lang - WARNING -  zh_Hant-terminology is missing

i'm running the following commands :

    ./build_language_list.py --results f33 --refresh
 && ./build_language_list.py --results f33 --analyzealllang
 && ./build_tm.py --results f33 --compress
 && ./build_stats.py --results f33

full execution log : https://paste.centos.org/view/2d9685a3


thanks, it should be fixed by: https://pagure.io/fedora-localization-statistics/c/effb206493e30bd85c29aea9d86cab7bf7dcc63f?branch=main

the way to find the faulty file is really naive and slow :/

Metadata Update from @jibecfed:
- Issue status updated to: Closed (was: Open)

3 years ago

oh, it's not finished!

2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING -  es-tmx is missing
2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING -  eu-tmx is missing
2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING -  eu-terminology is missing
2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING -  fr_BE-tmx is missing
2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING -  gl-tmx is missing
2021-02-15 14:46:33,269 - buildTm.check_lang - WARNING -  gl-terminology is missing
2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING -  pt-tmx is missing
2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING -  pt_BR-tmx is missing
2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING -  pt_BR-terminology is missing
2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING -  sv-tmx is missing
2021-02-15 14:46:33,271 - buildTm.check_lang - WARNING -  sv-terminology is missing

es
2021-02-14 23:40:53,108 - buildTm - ERROR - TMX generation triggered an ValueError exception: Syntax error on line 18499

eu
2021-02-14 23:48:32,784 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xf1 in position 316838: invalid continuation byte
2021-02-14 23:48:33,221 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xf1 in position 11: invalid continuation byte

fr_BE
2021-02-15 00:27:14,136 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xe9 in position 261: invalid continuation byte

gl
2021-02-15 00:32:50,160 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xfa in position 141702: invalid start byte
2021-02-15 00:32:50,512 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xfa in position 5: invalid start byte

pt
2021-02-15 12:35:24,136 - buildTm - ERROR - TMX generation triggered an ValueError exception: Syntax error on line 1564

pt_BR
2021-02-15 12:40:26,059 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xe9 in position 179884: invalid continuation byte
2021-02-15 12:40:26,727 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xe9 in position 6: invalid continuation byte

sv
2021-02-15 13:42:37,548 - buildTm - ERROR - TMX generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xd6 in position 510812: invalid continuation byte
2021-02-15 13:42:38,066 - buildTm - ERROR - Terminology generation triggered an UnicodeDecodeError exception: 'utf-8' codec can't decode byte 0xd6 in position 24: invalid continuation byte

Metadata Update from @jibecfed:
- Issue status updated to: Open (was: Closed)

3 years ago

build_tm also took about 20 hours to process, which is not acceptable :/

Metadata Update from @jibecfed:
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata