[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99933: second attempt at more comprehensive unicode policy



On Sat, 2003-01-04 at 21:17, Marco d'Itri wrote:
> On Jan 04, Colin Walters <walters@debian.org> wrote:
> 
>  >> We may want a BOM, at the start, though.
>  >
>  >We don't need one for UTF-8.  That's another one of the great things
>  >about it.
> What do you know about international environments? Maybe you do not need
> a BOM because your native language needs just ASCII and you do not have
> any text file encoded with latin-1, but in the rest of the world the
> situation is quite different.

If you can make an argument that starting every text file with a BOM
would be a good idea on a Unix-like system such as Debian, please do. 
Everything I have read argues otherwise.  Unix has always treated files
as just streams of bytes, and allowed you to concatenate streams with
pipes.  Having the BOM show up randomly, and expecting programs like
'cat' to remove it, or add it when it is missing, is too much to ask. 
'cat' can't know whether its input is random binary data or UTF-8.

But you don't have to listen to me, here are some arguments from Markus
Kuhn against it, which I turned up in a quick Google search:

http://www.rosat.mpe-garching.mpg.de/mailing-lists/perl-unicode/1999-11/msg00004.html

In any case, whether or not to start every file with a BOM is basically
orthogonal to my proposal, so we can discuss the BOM after the core
proposal has been accepted.




Reply to: