Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts
Matthew Garrett wrote:
Eh? A BOM (Byte Order Marker) is only needed where there's confusion
about what the byte order is. It's needed for UTF-16 (which is a fairly
decent demonstration of why UTF-16 is a Bad Thing), but not UTF-8. There
may be desirable to mark a file as being in UTF-8, but calling it a BOM
is just wrong.
Well, character U+FEFF has the name "ZERO WIDTH NO-BREAK SPACE" and the
aliases "BYTE ORDER MARK (BOM)" and "ZWNBSP", see
http://www.unicode.org/charts/PDF/UFE70.pdf
Encoding this character in UTF-8 gives the byte sequence \xef\xbb\xbf,
which is the byte sequence that this package looks for. I usually call
this byte sequence "UTF-8 signature", but the character that this byte
sequence represents is a proper Unicode character, and it happens to be
called BOM. See also the UTF and BOM FAQ published by the Unicode
consortium, at
http://www.unicode.org/unicode/faq/utf_bom.html#29
It says
"Yes, UTF-8 can contain a BOM."
It ends with
"the use of a BOM will interfere with [...] the use of "#!" of at the
beginning of Unix shell scripts."
which is the issue that this package addresses.
Regards,
Martin
Reply to: