Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature
Russ Allbery <rra@debian.org> writes:
> I don't believe it's correct to expect UTF-8 files to include this.
> I've heard of BOM marks used this from the very early days of Unicode,
> but so far as I understand it, the world has largely given up on this
> approach and UTF-8 generators do not produce them.
I did a bit more research, and apparently this approach has become more
blessed again. I'm glad I looked it up! As of Unicode 5.0, the standard
explicitly recommended against doing this, but the latest version of the
standard is moderately positive about it (although doesn't require it):
In UTF-8, the BOM corresponds to the byte sequence <EF16 BB16
BF16>. Although there are never any questions of byte order with UTF-8
text, this sequence can serve as signature for UTF-8 encoded text
where the character set is unmarked.
(although it does strongly discourage it if there's any other signaling
method available).
I'm still a bit dubious about this, since I don't believe editors and
generators normally add it, but given how we generate the text versions of
the documents, it's relatively easy to add a leading BOM and seems
harmless. I'll take a look.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: