[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#277563: shadow: [INTL:nb] Fix Norwegian encoding



On Thu, Oct 21, 2004 at 09:22:10AM +0100, Edmund GRIMLEY EVANS wrote:
> Christian Perrier <bubulle@debian.org>:
> 
> > > debian/po/nb.po is a mixture of UTF-8 and ISO-8859-1 encodings,
> > > and as a result accented letters are wrongly displayed.
> > > Here is a patch.
> > 
> > How did you check that?
> > 
> > My usual check "msgfmt -o /dev/null -c --statistics <file>" did not
> > show anything...
> > 
> > I would be very interested in adding such check to the various PO file
> > I handle here and there.
> 
> Probably somebody already has something better, but here's something
> that might work. Run it like this:
[...]
> # Check that we can convert from the claimed encoding.
> open(P, "|iconv -f $enc -t utf-8 > /dev/null");
> print P $t || die;
> close(P) || die;
> 
> # In the case of iso-8859-X, look for dodgy high control chars.
> if ($enc =~ /^iso-8859-/i) {
>     die if $t =~ /[\x80-\x9f]/;
> }

My first try looked something like that, but I wanted to submit patches,
and so I needed to know which strings are wrong in order to fix PO files.
Msgexec is used for that purpose.  People interested can have a look at
  http://people.debian.org/~barbier/check-po/checkfiles
  http://people.debian.org/~barbier/check-po/checkstring
and other files found in this directory.  (log.UTF-8.txt is a detailed
report, bylang.txt and bypkg.txt are summaries sorted by language and
package)

I do not know what to do with errors listed on
  http://people.debian.org/~barbier/check-po/log.UTF-8.txt
it would be really great if some translators could take care of their
language, especially when charset is different from ISO-8859-1 and
ISO-8859-15.  On the other hand, some coordination would be nice so
that a package gets bugged only once with a patch fixing all languages.
The best option is certainly to send a message here if you are willing
to fix bugs.

Please note that there are many false positives, e.g. translators may
use UTF-8 characters which have no equivalent in their legacy encoding,
or because the reported message is not displayed (it can be either fuzzy
or obsolete), so these errors have to be handled carefully.
Generating this report takes about an hour, so this script will not be
run periodically.

As usual, check the BTS before working on a package to see if a bug has
already been reported.  I filed bugs yesterday against bsdmainutils,
console-data, fonty, foomatic-filters and shadow packages.

Special thanks to Graham Wilson who fixed bsdmainutils within a couple
of hours, I was really impressed.

Denis



Reply to: