[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#277563: shadow: [INTL:nb] Fix Norwegian encoding



Christian Perrier <bubulle@debian.org>:

> > debian/po/nb.po is a mixture of UTF-8 and ISO-8859-1 encodings,
> > and as a result accented letters are wrongly displayed.
> > Here is a patch.
> 
> How did you check that ?
> 
> My usual check "msgfmt -o /dev/null -c --statistics <file>" did not
> show anything...
> 
> I would be very interested in adding such check to the various PO file
> I handle here and there.

Probably somebody already has something better, but here's something
that might work. Run it like this:

LC_ALL=C ./check_po_enc iso_3166-0.41.eo.po

No doubt some appropriate one-line addition to the code would make it
work in any locale, but I am in a state of perpetual confusion about
how Perl handles encodings.



#!/usr/bin/perl

unless ($#ARGV == 0) {
    print STDERR <<END
Usage: check_po_enc PO_FILE
Detects some obvious encoding errors, such as a mixture of
iso-8859-X and UTF-8. YOU MUST RUN THIS IN A C LOCALE!
END
;
    exit 1;
}

open(D, "<$ARGV[0]") || die;
$t = join("", <D>);
close(D);

# Discover the claimed encoding.
$t =~ /charset=([a-zA-Z0-9-]+)/ || die;
$enc = $1;

# Check that we can convert from the claimed encoding.
open(P, "|iconv -f $enc -t utf-8 > /dev/null");
print P $t || die;
close(P) || die;

# In the case of iso-8859-X, look for dodgy high control chars.
if ($enc =~ /^iso-8859-/i) {
    die if $t =~ /[\x80-\x9f]/;
}

Reply to: