[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

UTF8 (was : After potato,...)



Moin Sean 'Shaleh' Perry,

> is our utf8 handling that bad?

#------------------------------------------------------------------------------#
  use utf8;
  use locale;

  while (<>) { tr/\0-\xff//CU; print }
#------------------------------------------------------------------------------#

  this script should convert your native 8bit character set to UTF8,
  depending on LC_ALL! Well here is still a lot of work ;-(

  This would also solve the biweekly question, of what to do with the
  strings from XML::Parser, if the output is for a webpage -
  tr/\0-\x{ff}//UC; would be the answer. ;-)

Moin Chip Salzenberg,

> I was assuming he was correct.  It's certainly true that lots of work
> is going into Unicode support in the 5.6-to-be.

  The problem is that there are only islands where parts of utf8 work
  fine. Those "instable" perls will couse many other problems in other
  cases, so better wait for the next stable perl, to include it.

  We had the discussion not to claim "use 5.0055;" between Ken,Enno and me,
  even if its hard that our toys does not work 100% smart out of the box.

  On the other hand side, Enno and I are using actual perl to be prepared
  for 'use utf8;' and to avoid unnecessary coding, of something that can
  be handled by 2 pragmas in one or two month. The same as the tr above
  can also be done with XML::RegExp but about 100 times slower, if compared
  using Perl5.005-60 and Devel::DProf.

  I'll now call for stakes, whether XML::XSLT or Perl5.6 is faster. The
  former will need a speedy utf8 hopefully provided by the later ;-)

Bye Michael
-- 
  mailto:kraehe@copyleft.de     	UNA:+.? 'CED+2+:::Linux:1.2:13'UNZ+1'
  http://www.xml-edifact.org/		CETERUM CENSEO MSDOS ESSE DELENDAM


Reply to: