[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature



Russ Allbery <rra@debian.org> writes:

> I don't believe it's correct to expect UTF-8 files to include this.
> I've heard of BOM marks used this from the very early days of Unicode,
> but so far as I understand it, the world has largely given up on this
> approach and UTF-8 generators do not produce them.

I did a bit more research, and apparently this approach has become more
blessed again.  I'm glad I looked it up!  As of Unicode 5.0, the standard
explicitly recommended against doing this, but the latest version of the
standard is moderately positive about it (although doesn't require it):

    In UTF-8, the BOM corresponds to the byte sequence <EF16 BB16
    BF16>. Although there are never any questions of byte order with UTF-8
    text, this sequence can serve as signature for UTF-8 encoded text
    where the character set is unmarked.

(although it does strongly discourage it if there's any other signaling
method available).

I'm still a bit dubious about this, since I don't believe editors and
generators normally add it, but given how we generate the text versions of
the documents, it's relatively easy to add a leading BOM and seems
harmless.  I'll take a look.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: