[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#796170: lintian: [new check] warn on non-UTF8 text files



Adam Borowski <kilobyte@angband.pl> writes:

> Here's an experimental tag, a step towards elimination of mojibake
> system-wide.  It checks all text files in *bin/, /usr/share/doc/ and
> those that look like a script file.  "Text" is defined as not having any
> bytes in the 0..31 range other than tabs, newlines (incl. Windows ones)
> or form feeds.  In practice, this definition appears to work pretty
> well, although the list of files that should be skipped despite being
> text needs work.

> It's a part of the "UTF-8 everywhere" release goal that I intend to
> re-propose for Stretch.

> This is only a preliminary version, let's discuss what you think.  If
> you're on DebConf, you can contact me in person.

The last time I looked at this in a policy context, the distribution
included a few documentation files that were intentionally provided
upstream in multiple different encodings.  In other words, there would be
a README.sjis and a README.utf8, etc., side-by-side.  In those cases, it
feels bad to have Lintian tag the README.sjis file and have maintainers
possibly just not install it, when it might still be a convenience to some
users.

Maybe this check should exclude files that have an extension that
indicates they were intentionally encoded in some other encoding?

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: