[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie (debhelper and BOM)



Hi,

UTF-8 is a good goal indeed as principle.  

(I agree but I am struggling to update package documentation since
Japanese are known to be tough (JIS 2022/EUCJP/SHIFT-JIS/... are used)
EUC/SHIFT-JIS mixed case  can be confused with LATIN-1 easily. )

But I do not understand goal #5.  Why "MUST"?  Do you have rationale?

On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote:
> On Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski wrote:
> > I propose the following sub-goals:
...
> > 4. all text files should be encoded in UTF-8

Yes.  But it will be nice to have some support by dh_installdocs :-)
                                                  ^^^^^^^^^^^^^^

> 5. All programs consuming UTF8 Text must understand a BOM.
                                      ^^^^

I agree as "SHOULD" but should we state "MUST"? 

After all BOM has no value in UTF-8 except to upset some programs.  
See Wikipedia page: http://en.wikipedia.org/wiki/Byte_order_mark

 | The Unicode Standard permits the BOM in UTF-8, but does not require
 | or recommend its use. Byte order has no meaning in UTF-8 ...
    (pointer to the Unicode document is listed there.)

If it is only for the first byte, it is relatively easy.  But there are
text data with bogus BOM in the content.  Should program understand them
to be safe, too?

FYI: I had problem recently for PO files containing lots of BOM inside
of a text file which broke running XaTeX.  Please note TeX family of
programs have more elaborate character support than Unicode only UTF-8.
I would rather have XeTeX ...)  To me, program to filter such BOM will
be nice.  But we should not shoot a good UTF-8 program for stupid BOM
containing UTF-8 data.

Osamu



Reply to: