[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie

* Adam Borowski <kilobyte@angband.pl>, 2013-08-12, 02:51:
Detecting non-UTF files is easy:
* false positives are impossible
* false negatives are extremely unlikely: combinations of letters that would happen to match a valid utf character don't happen naturally, and even if they did, every single combination in the file tested would need to match valid utf.

Not+IAo-quite. While 7-bit encodings different than ASCII are all endangered species, some of them can still be seen in the wild, and they excellently disguise themselves as UTF-8. (We had to add special code to detect ISO-2022 encodings to Lintian not that long ago.)

Anyway, it you want to help UTF-8-ize the world, you could start by providing patches for these bugs:

Jakub Wilk

Reply to: