[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[OT] Re: why perl 5.8 won't be in testing for a while



[Off-Topic, only of interest if you love perl and struggle with encoding
issues]

On Sat, Oct 05, 2002 at 08:57:21AM -0500, Ardo van Rangelrooij wrote:
> Bart Schuller (schuller@lunatech.com) wrote:
> > The biggest problem with perl modules and character encodings right now
> > is getting people to stop using "backward compatible" workarounds. They
> > stand in the way of clean, working interfaces for current perl versions.
> 
> Define "current".  There are still a lot of people out there that for good
> reasons do not and/or cannot use the latest and greatest, but e.g. 5.004
> (or something similar).  And so far I've not seen an indication on CPAN to
> move "forward".

Current == 5.8

It is a bugfix release. The bug was a not fully thought-out and
incomplete unicode subsystem, leading to every module and programmer
first cursing and then working around the warts.

In 5.6, a string could be stored in one of two forms: the way it was
read in, or utf-8. Perl would sometimes switch between the two forms
when manipulating the data and when the time came to print out some
results, you just got whatever the internal representation happened to
be.
Most people would make sure to never introduce any utf8 data, so they
could be sure they wouldn't get any in their output either. Which works
fine until you decide to input some data with XML::Parser, which started
force-feeding you utf-8 marked strings that would contaminate your other
data. A mess.

CPAN is full of modules that can convert strings in any encoding into
any other encoding. Which is nice, but is completely separate from the
built-in utf8 support.

In what way does 5.8 fix this?

For the first time it's now possible to explicitly tell perl about the
encoding of all data entering your program. And what's more important,
you can tell it to output data in any encoding you wish.

What this means is that you no longer have to *care* which internal
encoding perl uses, because you can get it out in the form *you* want,
*when* you want it.

The new Encode module and the PerlIO subsystem make this possible. You
can specify an encoding with every call to open(), but you can also
explicitly decode some bytes you got from who knows where into perl's
internal format.
And of course you can set default encodings for input and output.

What does this have to do with the original problem (failing tests in
some module packages)?

As long as these modules try to work around bugs in 5.6 and missing
functionality in 5.005 or even earlier, you can't expect a good and
meaningful integration with 5.8.

No, I don't see a "movement forward" in CPAN either, and it's a bloody
shame.

There's a tiny speck of bright light though: Debian unstable currently
ships with 5.8, which means modules destined for unstable won't *need*
any workarounds for older perls.

-- 
Bart.



Reply to: