#13 %linuxtext breaks UTF-8 text
Opened 3 years ago by tagoh. Modified 8 months ago

I was trying to fix rpmlint warning like:

hanamin-fonts.noarch: W: wrong-file-end-of-line-encoding
/usr/share/doc/hanamin-fonts/README.txt
hanamin-fonts.noarch: W: wrong-file-end-of-line-encoding
/usr/share/doc/hanamin-fonts/THANKS.txt

which was pointed out at https://bugzilla.redhat.com/show_bug.cgi?id=1825183 but once added %linuxtext README.txt THANKS.txt, I got:

anamin-fonts.noarch: W: file-not-utf8 /usr/share/doc/hanamin-fonts/README.txt
hanamin-fonts.noarch: W: file-not-utf8 /usr/share/doc/hanamin-fonts/THANKS.txt

Originally it was:

$ file README.txt THANKS.txt  
README.txt: UTF-8 Unicode text, with CRLF line terminators
THANKS.txt: UTF-8 Unicode text, with CRLF line terminators

After converting by macro:

$ file README.txt THANKS.txt 
README.txt: Non-ISO extended-ASCII text, with LF, NEL line terminators
THANKS.txt: Non-ISO extended-ASCII text, with LF, NEL line terminators

It looks like you need the -n argument to prevent character recoding. Try %linuxtext -n README.txt THANKS.txt.

No, it doesn't help. This is the character classification issue. %linuxtext macro runs sed with LANG=C LC_ALL=C though, the character sets for C locale isn't UTF-8. Thus, sed can't deal with UTF-8 text properly.

To fix this, they should run sed with LANG=C LC_ALL=C.UTF-8.

Login to comment on this ticket.

Metadata