[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

libxml-sax-writer-perl does not utf-8 encode!



Hi,

after installing a few new packages (sorry, cannot exactly figure
out, which dependencies where resolved and installed by apt) on
my sarge system, I encountered problems with utf-8 encoding and
perl/xml. It seems like a bug to me, but since I'm rather new
to perl/xml programming I won't send a report to the BTS.

The problem can be cooked down to the following simple script:

#!/usr/bin/perl
use XML::SAX::Writer;
use XML::LibXML::SAX::Parser;
my $writer = XML::SAX::Writer->new(Output  => "test_out.xml");
my $parser =  XML::LibXML::SAX::Parser->new();
my %parser_args = (Source  => {SystemId => "test_in.xml"},
                   Handler => $writer);
$parser->parse(%parser_args);

where test_in.xml is something like

?xml version="1.0" encoding="UTF-8"?>
<Only_A_Test>
   utf-8 encoded non ascii chars like: UniversitÃ?t
</Only_A_Test>

In test_out.xml the non-ascii chars are iso-latin1 encoded.

Can anyone reproduce this? Any hint?

Thanks, Thomas




Reply to: