[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Examples in local encoding



Hi,

I've been working in a package (QA) that show diff between documents. And a
selling point is working with different encodings (see the long description
bellow).

 DocDiff compares two files and shows the difference.  It can compare files
 word by word, char by char, or line by line.
 .
 It has several output formats such as HTML/XHTML, tty, Manued, or
 user-defined markup.  It supports several encodings and end-of-line
 characters, including ASCII, UTF-8, EUC-JP, Shift_JIS, CR, LF, and CRLF.

The upstream provides some examples which are pairs of files with small changes
between them. This leads to my question. One of these pairs use local japanese
encoding which makes the lintian scream:

W: docdiff: national-encoding usr/share/doc/docdiff/examples/01.ja.eucjp.lf
W: docdiff: national-encoding usr/share/doc/docdiff/examples/02.ja.eucjp.lf

The recommendation from this warning is to convert the file to UTF-8 but there
is already UTF-8 examples in the directory. And upon closer inspection, some
files use other encodings to (see bellow).

01.en.ascii.cr:           ASCII text, with CR line terminators
01.en.ascii.crlf:         ASCII text, with CRLF line terminators
01.en.ascii.lf:           ASCII text
01.ja.eucjp.lf:           ISO-8859 text
01.ja.sjis.cr:            Non-ISO extended-ASCII text, with CR line terminators
01.ja.sjis.crlf:          Non-ISO extended-ASCII text, with CRLF line terminators
01.ja.utf8.crlf:          UTF-8 Unicode text, with CRLF line terminators
02.en.ascii.cr:           ASCII text, with CR line terminators
02.en.ascii.crlf:         ASCII text, with CRLF line terminators
02.en.ascii.lf:           ASCII text
02.ja.eucjp.lf:           ISO-8859 text
02.ja.sjis.cr:            Non-ISO extended-ASCII text, with CR line terminators
02.ja.sjis.crlf:          Non-ISO extended-ASCII text, with CRLF line terminators
02.ja.utf8.crlf:          UTF-8 Unicode text, with CRLF line terminators
humpty_dumpty01.ascii.lf: ASCII text
humpty_dumpty02.ascii.lf: ASCII text

My question is what is the best course of action in this situation? I see 3
possible ways for dealing with this.

1. Not installing these files indicated by the lintian (what about the others
not indicated like *ja.sjis.cr?).
2. Installing the files in spite of lintian warning.
3. Convert them to UTF-8 even with UTF-8 files already provided.

I tend to the first option but I'm not sure so I would really like to hear your
opinion. Also I'd like to know what is the status of these local encodings in
Debian, is there any place still using it?

Sorry for the long email and thanks,
Charles


Reply to: