[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie



* Adam Borowski <kilobyte@angband.pl>, 2013-08-12, 02:51:
Detecting non-UTF files is easy:
* false positives are impossible
* false negatives are extremely unlikely: combinations of letters that would happen to match a valid utf character don't happen naturally, and even if they did, every single combination in the file tested would need to match valid utf.

Not+IAo-quite. While 7-bit encodings different than ASCII are all endangered species, some of them can still be seen in the wild, and they excellently disguise themselves as UTF-8. (We had to add special code to detect ISO-2022 encodings to Lintian not that long ago.)

Anyway, it you want to help UTF-8-ize the world, you could start by providing patches for these bugs:
http://lintian.debian.org/tags/debian-changelog-file-uses-obsolete-national-encoding.html
http://lintian.debian.org/tags/debian-copyright-file-uses-obsolete-national-encoding.html
http://lintian.debian.org/tags/doc-base-file-uses-obsolete-national-encoding.html

--
Jakub Wilk


Reply to: