Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts

To: Matthew Garrett <mgarrett@chiark.greenend.org.uk>, 261413@bugs.debian.org
Subject: Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts
From: "Martin v. Löwis" <martin@v.loewis.de>
Date: Mon, 26 Jul 2004 07:38:21 +0200
Message-id: <[🔎] 410498CD.2050902@v.loewis.de>
Reply-to: "Martin v. Löwis" <martin@v.loewis.de>, 261413@bugs.debian.org
In-reply-to: <[🔎] E1BorQ5-00048S-00@chiark.greenend.org.uk>
References: <[🔎] E1BoqkH-0000Xu-JK@localhost> <[🔎] E1BorQ5-00048S-00@chiark.greenend.org.uk>

Matthew Garrett wrote:

Eh? A BOM (Byte Order Marker) is only needed where there's confusion
about what the byte order is. It's needed for UTF-16 (which is a fairly
decent demonstration of why UTF-16 is a Bad Thing), but not UTF-8. There
may be desirable to mark a file as being in UTF-8, but calling it a BOM
is just wrong.


Well, character U+FEFF has the name "ZERO WIDTH NO-BREAK SPACE" and the
aliases "BYTE ORDER MARK (BOM)" and "ZWNBSP", see

http://www.unicode.org/charts/PDF/UFE70.pdf

Encoding this character in UTF-8 gives the byte sequence \xef\xbb\xbf,
which is the byte sequence that this package looks for. I usually call
this byte sequence "UTF-8 signature", but the character that this byte
sequence represents is a proper Unicode character, and it happens to be
called BOM. See also the UTF and BOM FAQ published by the Unicode
consortium, at

http://www.unicode.org/unicode/faq/utf_bom.html#29

It says

"Yes, UTF-8 can contain a BOM."

It ends with

"the use of a BOM will interfere with [...] the use of "#!" of at the
beginning of Unix shell scripts."

which is the issue that this package addresses.

Regards,
Martin

Reply to:

References:
- Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts
  - From: "Martin v. Loewis" <martin@v.loewis.de>
- Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts
  - From: Matthew Garrett <mgarrett@chiark.greenend.org.uk>

Prev by Date: Bug#261093: ITP: libspf -- official ANSI C sender policy framework (SPF) library
Next by Date: Bug#231360: marked as done (ITP: epiar -- Space combat and adventure game)
Previous by thread: Bug#261413: ITP: utf8script -- binfmt_misc plugin for UTF-8 scripts
Next by thread: Bug#256103: Retitling to ITP
Index(es):
- Date
- Thread