[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts



Matthew Garrett wrote:
Eh? A BOM (Byte Order Marker) is only needed where there's confusion
about what the byte order is. It's needed for UTF-16 (which is a fairly
decent demonstration of why UTF-16 is a Bad Thing), but not UTF-8. There
may be desirable to mark a file as being in UTF-8, but calling it a BOM
is just wrong.

Well, character U+FEFF has the name "ZERO WIDTH NO-BREAK SPACE" and the
aliases "BYTE ORDER MARK (BOM)" and "ZWNBSP", see

http://www.unicode.org/charts/PDF/UFE70.pdf

Encoding this character in UTF-8 gives the byte sequence \xef\xbb\xbf,
which is the byte sequence that this package looks for. I usually call
this byte sequence "UTF-8 signature", but the character that this byte
sequence represents is a proper Unicode character, and it happens to be
called BOM. See also the UTF and BOM FAQ published by the Unicode
consortium, at

http://www.unicode.org/unicode/faq/utf_bom.html#29

It says

"Yes, UTF-8 can contain a BOM."

It ends with

"the use of a BOM will interfere with [...] the use of "#!" of at the
beginning of Unix shell scripts."

which is the issue that this package addresses.

Regards,
Martin



Reply to: