Re: [D-I] Automatic spellcheck of po files

On Wednesday 19 January 2005 22:20, Petter Reinholdtsen wrote:
> Is there some charset problem?  I looked at the
> unknown words for nb, and "går" and "når" are definitely not unknown
> words in the dictionary.

I see the same kind of problem with Dutch.

The unknown wordlist shows 'Brazilië', which is 'Brazilië' in UTF-8 
(Dutch for Brazil).
I've just checked the a-spell Dutch wordlist and Brazilië _is_ included.

$ aspell dump master /usr/lib/aspell/dutch | grep "Brazil"

It looks like the dump prints a ISO-8859-1 coded list.

I think the manpage for aspell gives the answer:
       The  encoding  the input text is in. Valid values are ``utf-8'',
       ``iso8859-*'',  ``koi8-r'',  ``viscii'',  ``cp1252'',  ``machine
!!     unsigned  16'',  ``machine  unsigned  32''.  However, the Aspell
!!     utility will currently only function correctly with 8-bit encod-
!!     ings. utf-8 support is planned for the future. The two ``machine
       unsigned'' encodings are intended to be used by  other  programs
       using  the  Aspell library and it is unlikely the Aspell utility
       will ever support these encodings.

So it looks as if you may have to iconv the files before you test them 
(or, even better, patch aspell so it supports utf-8 ;-)

