[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: accented chars. shown as question marks in non-browser tools, sarge



On Mar 08 2007, Arlie Stephens wrote:
> 
> Hi,
> 
> Thanks for the detailed suggestions and explanation. I can't test this
> immediately, but from what I can check, I think it's going to work.

Well, I'm now at home, trying this, and it didn't quite work - the
bottom line being that not all the messages I receive are in UTF
encoding. 

> On Mar 08 2007, H.S. wrote:
> > Arlie Stephens wrote:

> > 1. Use gnome-terminal or konsole (at least it out)

> I'll try konsole; I don't use gnome. (If I wanted a windows XP lock
> and feel, I could install windows XP ;-( though kde isn't much better
> on that score.) 

I'm using a 'konsole' window for this experiment. 

> > 3. Make sure the relevant language fonts are installed.

I'm not sure what fonts I have, but it doesn't seem to be the
problem. 

> > Give the "set" command on a command prompt and see that you get for 
> > these variables (I have the following):
> > $> set | grep LANG
> > LANG=en_CA.UTF-8
> > LANGUAGE=en_CA:en_US:en_GB:en
> 
> $ set | grep LANG
> LANG=C

 set | grep LANG
LANG=en_US.utf8
LANGUAGE=en_US.utf8:en_US.iso88591

> > If your LANG is not set to UTF-8, you need to set it right. Make sure 
> > you have the various locales install:
> > $> locale -a
> > C
> > en_CA
> > en_CA.iso88591
> > en_CA.utf8
> > en_GB
> > en_GB.iso88591
> > en_GB.iso885915
> > en_GB.utf8
> 
> $ locale -a
> C
> POSIX

 locale -a
C
de_DE
de_DE@euro
de_DE.iso88591
de_DE.iso885915@euro
de_DE.utf8
de_DE.utf8@euro
deutsch
en_CA
en_CA.iso88591
en_CA.utf8
en_GB
en_GB.iso88591
en_GB.iso885915
en_GB.utf8
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
français
fr_CA
fr_CA.iso88591
fr_CA.utf8
french
fr_FR
fr_FR.iso88591
fr_FR.utf8@euro
german
icelandic
is_IS
is_IS.iso88591
is_IS.utf8
POSIX

Yes, I went a wee bit overboard. 

> > If you do not a list similar to the one above, you need to generate the 
> > locale choices:
> > <become root>
> > #> dpkg-reconfigure locale
> > 
> > (or is it locales? try both). It will give you an ncurses based list of 
> > locale choices to generate, Select the one you want and press OK. Next 
> > screen will ask you to set a default locale. Set a UTF-8 locale. After 
> > this configuration, logout and log in again. Use the "set" command again 
> > to make sure you have a UTF-8 locale now.
> 
> I'll try this at lunch. I'm logged in remotely, and much prefer to be
> physically at the console when doing anything as root. 

What an *awful* interface. It does not remember past choices, so if
you miss one, you have to re-enter all of them. Moreover, there are
100s of choices, so you wind up scrolling through several screens,
noting down numbers to be entered, before you can enter any of them. 

The alternative appears to be to know the names of all those you might
want, and edit /etc/locale.gen manually. That's got more learning
curve, but might be a lot less maddening in the long run. 

> > Next, make sure you have relevant fonts install. I guess it is good to 
> > have the ttf-* fonts installed, at least the ones you think you
> > might need.

I think I installed anything remotely resembling a font associated
with any European language, but I'm pretty much totally ignorant in
this area. 

> > Finally, open gnome-terminal or konsole and fire up mutt. You should see 
> > various language characters in all their glory. BTW, xterm does not 
> > support UTF-8 properly yet.

Well, what I got was a different kind of gobbletygook. I presume it
won't cut and paste ;-) but looking at a message that should be
quoting Icelandic, I'm seeing improbably characters like a capital A
with a horizontal bar across it, embedded in the middle of
words. (This is not a valid Icelandic character.) Even better was the
cute little fraction symbol (3/4) embedded in the middle of the
word. My best guess is that this probably isn't UTF-8 text, but I
suppose it could be a font problem. 

Aha - bingo - 

    export LANG=en_US.iso885915

Now the bizarre symbols are replaced with eths and other Icelandic
characters, and the quoted text becomes comprehensible. (Well, close
to comrehensible, my Icelandic understanding is even more limited than
my systems administration savvy.) 

And let me guess - there's no way to automatically determine what
encoding a given message may contain, and the mutt mail client
probably cannot switch encodings on the fly. 

Oh what fun!

Still, it's way better than it's been in ages. 

Thank you very much.

-- 
Arlie

(Arlie Stephens	                              arlie@worldash.org)



Reply to: