[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 in jessie

On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote:
> 5. All programs consuning UTF8 Text must understand a BOM.

I'm afraid I don't agree here: BOMs are nasty stuff that serve no purpose
once you standardize on UTF8.  They might help with exchange with a minority
of Windows programs, at a cost at our side.  Windows hardly does plain text:
most of that is MSVC/etc sources, but then, the C/C++ standards explicitely
forbid junk in places other than comments.  Most other languages expect a
hashbang on Unix, which makes BOMs impossible.

Other reasons:
* concatenating files adds a misplaced BOM
* taking stuff from the middle loses them
* tools like grep, patch, etc pick and insert lots of individual lines
* tools that don't care about encodings would need to learn about them
* files that appear the same will have a different hash due to presence or
  absence of an invisible character that can appear/disappear with no
  explicit request on the user's part
* with UTF-8, we're 95% there.  For BOMs, there's almost no support.

So I'm strongly against producing BOMs.  As for accepting them, there's
little that can break so it would be mostly ok... but certainly not as
a "must" clause.


Reply to: