Bug#796170: lintian: [new check] warn on non-UTF8 text files
Adam Borowski <kilobyte@angband.pl> writes:
> Here's an experimental tag, a step towards elimination of mojibake
> system-wide. It checks all text files in *bin/, /usr/share/doc/ and
> those that look like a script file. "Text" is defined as not having any
> bytes in the 0..31 range other than tabs, newlines (incl. Windows ones)
> or form feeds. In practice, this definition appears to work pretty
> well, although the list of files that should be skipped despite being
> text needs work.
> It's a part of the "UTF-8 everywhere" release goal that I intend to
> re-propose for Stretch.
> This is only a preliminary version, let's discuss what you think. If
> you're on DebConf, you can contact me in person.
The last time I looked at this in a policy context, the distribution
included a few documentation files that were intentionally provided
upstream in multiple different encodings. In other words, there would be
a README.sjis and a README.utf8, etc., side-by-side. In those cases, it
feels bad to have Lintian tag the README.sjis file and have maintainers
possibly just not install it, when it might still be a convenience to some
users.
Maybe this check should exclude files that have an extension that
indicates they were intentionally encoded in some other encoding?
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: