Re: Examples in local encoding

To: debian-mentors@lists.debian.org
Subject: Re: Examples in local encoding
From: Paul Wise <pabs@debian.org>
Date: Mon, 21 Sep 2020 03:02:14 +0000
Message-id: <[🔎] CAKTje6EHOKUwu8TmoV4V5HeuWRExqnTUomqWK9hn+LxOsQpYOw@mail.gmail.com>
In-reply-to: <[🔎] CP2PR80MB4628AA47A3054F310DE2C2D1A83D0@CP2PR80MB4628.lamprd80.prod.outlook.com>
References: <[🔎] CP2PR80MB4628AA47A3054F310DE2C2D1A83D0@CP2PR80MB4628.lamprd80.prod.outlook.com>

On Sun, Sep 20, 2020 at 10:50 PM Carlos Henrique Lima Melara wrote:

> I've been working in a package (QA) that show diff between documents. And a
> selling point is working with different encodings (see the long description
> bellow).

This feature is unfortunately quite hard to use, it seems like unless
you specify *both* line ending and encoding, docdiff uses Ruby
features that (now?) require UTF-8 encoding while automatically
determining the line ending and encoding, which sort of defeats the
point of the automatic line ending and encoding detection in the first
place. If the 2020 release of docdiff 0.6.0 (which fixed a bunch of
encoding issues) doesn't fix this then you might want to file an issue
about it on GitHub.

Also it looks like another maintainer is intending to upload the
Debian package too, you might want to coordinate with them if you
aren't already doing that.

I also note the latest release adds support for one more encoding, so
it does seem like upstream thinks this is an important feature.

https://github.com/hisashim/docdif/releases
https://github.com/hisashim/docdiff/issues/25
https://github.com/hisashim/docdiff/issues/new

> The upstream provides some examples which are pairs of files with small changes
> between them. This leads to my question. One of these pairs use local japanese
> encoding which makes the lintian scream:
>
> W: docdiff: national-encoding usr/share/doc/docdiff/examples/01.ja.eucjp.lf
> W: docdiff: national-encoding usr/share/doc/docdiff/examples/02.ja.eucjp.lf

https://lintian.debian.org/tags/national-encoding.html

> 2. Installing the files in spite of lintian warning.

Please note that when choosing item 2 you should also apply a lintian
override with a comment about why these files are present.

> 3. Convert them to UTF-8 even with UTF-8 files already provided.

I think this isn't useful so I would discard that option.

> My question is what is the best course of action in this situation?

For example files that can be used to demonstrate features of the
program, I see no reason to remove the files. OTOH I also see no
reason to include example files for a relatively obscure feature. OTOH
I see no reason to discriminate against users of obscure encodings.
Also, upstream probably has a reason for keeping those files. Probably
there isn't really a best course of action, so I'd lean towards the
status quo for now, with a lintian override.

> Also I'd like to know what is the status of these local encodings in
> Debian, is there any place still using it?

It sounds like as of 2017, non-UTF-8 Japanese encodings are still in a
small amount of use and kakaku.com still seems to use Shift-JIS.

https://en.wikipedia.org/wiki/Japanese_character_encoding

--
bye,
pabs

https://wiki.debian.org/PaulWise

Reply to:

References:
- Examples in local encoding
  - From: Carlos Henrique Lima Melara <charlesmelara@outlook.com>

Prev by Date: Re: Consequences of bumping Dh compat level for a dh extension
Next by Date: Bug#970620: Subject: RFS: libwcat1/1.1-3 [ITA] -- Process monitoring library
Previous by thread: Examples in local encoding
Next by thread: Bug#970696: RFS: mtd-utils/1:2.1.2-1 [ITS] -- Memory Technology Device Utilities
Index(es):
- Date
- Thread