[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LSB status of sarge?



On Wed, 2004-08-25 at 15:06, Michael Stone wrote:
> On Wed, Aug 25, 2004 at 02:14:28PM -0500, you wrote:
> >Which brings me to the question: what problems do you have with the
> >current patches?
> 
> I don't really want to rehash all that in this forum. 

Cool.  Is there a link to a mailing list thread or something I could
reference?  I've done some googling, but haven't found much that's
helpful yet.

> Basically the
> major objection was that the code basically did 
>    if(multibyte)
>       one version of utility
>    else
>       other version of utility
> 
> which is an unmaintable mess. 

As I understood it, this was the major objection.  However, I've also
heard that the patch works this way because of some performance
considerations when handling unibyte with the multibyte code.

I'm sensitive to this on one level, but on another, multibyte just will
have higher overhead than unibyte.  So I'm hoping to fix multibyte to
perform as well as possible, and assume that people will accept a small
performance hit for the sake of our multibyte users.

> My other objection was that the patches
> didn't address every code path which might see multibyte user input or
> output, only certain paths. My impression is that once you start down
> the multibyte road you basically need to look carefully at anything in
> the code that uses a string, and the patches weren't doing that--they
> were only fixing a couple of things that got dinged in the LSB test
> suite. E.g., what happens if there's a multibyte string in a utmp file?
> Should who(1) output still line up? Do we handle properly handle
> multibyte input strings which contain single byte escape characters in
> things like printf, echo, and date? 

It's my understanding that a proper multibyte implementation still uses
fixed-width characters, just wider.  Specifically, most people told me
that it's futile to use UTF-8 Unicode internally; instead, UTF-8 input
should be converted to UCS-2 for internal use and then manipulated as
multibyte.

Obviously, the question is: what to do with UTF-8 in external files? 
You may be right in that the patch doesn't handle that well, but I
haven't gotten to that point in my own evaluation.

> It might be that these are
> non-issues (I don't know, I only do english :) But I haven't seen a
> justification of why something presented as "the coreutils multibyte
> patch" is really "the" coreutils multibyte patch. 

It may not be "the" patch, but it is "a" patch, and the lack of any
other makes it "the" patch by default.  Certainly the other
distributions have been taking that approach.

> The larger meta-issue is that I'm not particularly interested in
> maintaining a huge patch that is wildly divergent from upstream. I'd
> prefer to see the openwhatevernumber people convince the coreutils
> upstream that they've got something decent. If they even had something
> that upstream would bless as being on the right track I'd consider doing
> some kind of testing--but at this point AFAIK the patch is a
> non-starter.

We are in complete agreement here.  My goal is not (just) to create
another patch, or to fix up this one, but to make the need for a patch
unnecessary.



Reply to: