UTF8 (was : After potato,...)
Moin Sean 'Shaleh' Perry,
> is our utf8 handling that bad?
#------------------------------------------------------------------------------#
use utf8;
use locale;
while (<>) { tr/\0-\xff//CU; print }
#------------------------------------------------------------------------------#
this script should convert your native 8bit character set to UTF8,
depending on LC_ALL! Well here is still a lot of work ;-(
This would also solve the biweekly question, of what to do with the
strings from XML::Parser, if the output is for a webpage -
tr/\0-\x{ff}//UC; would be the answer. ;-)
Moin Chip Salzenberg,
> I was assuming he was correct. It's certainly true that lots of work
> is going into Unicode support in the 5.6-to-be.
The problem is that there are only islands where parts of utf8 work
fine. Those "instable" perls will couse many other problems in other
cases, so better wait for the next stable perl, to include it.
We had the discussion not to claim "use 5.0055;" between Ken,Enno and me,
even if its hard that our toys does not work 100% smart out of the box.
On the other hand side, Enno and I are using actual perl to be prepared
for 'use utf8;' and to avoid unnecessary coding, of something that can
be handled by 2 pragmas in one or two month. The same as the tr above
can also be done with XML::RegExp but about 100 times slower, if compared
using Perl5.005-60 and Devel::DProf.
I'll now call for stakes, whether XML::XSLT or Perl5.6 is faster. The
former will need a speedy utf8 hopefully provided by the later ;-)
Bye Michael
--
mailto:kraehe@copyleft.de UNA:+.? 'CED+2+:::Linux:1.2:13'UNZ+1'
http://www.xml-edifact.org/ CETERUM CENSEO MSDOS ESSE DELENDAM
Reply to: