[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

utf8 Problems



Hi debian-user!

I converted to utf8 in the hope that my non ASCII character problems
would disappear. They are now ... different.

I used utf8migrationtool and locale now says:

bernhard@b:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I am in Austria, where we speak German, but I chose en because the
German translations are often so ridiculous (in mc's config:
'verbose operation' gets 'redselige Vorgaenge', bash says 'getoetet'
instead of 'killed', when a process get's killed).

I chose US because I tought that was most used and thus most stable.

Now the problems:

I wanted to print a German text containing umlauts from a web page.
I marked it in iceweasel and pasted it into a 'konsole' running bash
running 'cat >x'. 'lpr x' printed only a page with the character 'K'.

'hexdump -C x' says:

00000010  20 20 20 20 20 20 4b fc  6e 64 69 67 75 6e 67 73  |
K.ndigungs|
00000020  62 65 73 63 68 72 e4 6e  6b 75 6e 67 65 6e 0a 0a
|beschr.nkungen..|

so ü is 0xfc, ä is 0xf4, and the characters are printed as
periods '.'.

mc's viewer says:

00000010 20 20 20 20  20 20 4B FC  6E 64 69 67  75 6E 67 73
Kündigungs
00000020 62 65 73 63  68 72 E4 6E  6B 75 6E 67  65 6E 0A 0A
beschränkungen..

Here ü is still only the single byte 0xFC, but it gets printed
as 'A' with a tilde and a '1/4' character. &auml is again 0xE4 but
printed as 'A' with a tilde and a circle with 4 short lines
extending from the circle diagonally.

Opening x in openoffice writer shows rhombuses with question marks
for each umlaut.

Opening x.html in openoffice writer I was unable to remove all the
table etc. stuff and so was unable to reformat the text so it would
fit on one page. Hmm, it might work, if I copied the text from there
into a new document. But here I want to solve the locale problems,
or what should I call the problem?

mc (midnight commander, a norton commander clone) of course goes
crazy again, but I was not surprised and accepted that it prints 'a'
with '^' instead of line art, etc. More serious was that when I
'ssh'ed to a different computer (not sure which) it got confused
about which line it was on and I messed up editing /etc/fstab.

man gets quote characters wrong, printing 'a' with '^' instead and
so does gcc.

I also have problems with kvirc. IIRC I can get it to display
iso8859-1 correctly, but not utf8, and the smart utf8/iso8859-1 mode
does not work. I chat with users who use iso8859-1 and utf8.

Is there a package which is responsible for all these problems so I
can file a bug report against it? Or are these bugs in konsole, gcc,
man, bash, mc, iceweasel, openoffice and kvirc? Or ... is the bug
sitting in front of the computer again :)?

I wonder if it's easier to set up debian from scratch.

I'm basically running debian testing (since a long time) but because
I sometimes want packages from stable or unstable I have that in
sources.list, too (well, stable is commented out currently) and so I
don't upgrade to unstable I have this in /etc/apt/preferences:

----------------------------
Package: *
Pin: release a=stable
Pin-Priority: 650

Package: *
Pin: release a=testing
Pin-Priority: 700

Package: *
Pin: release a=unstable
Pin Priority: 600
----------------------------

Thanks, Bernhard

-- 
Please encrypt all emails
GPG Key (ID F732FBF3 E4219D48) available on public key servers
Fingerprint E18F BF4D 0EE2 6522 E950  A06A F732 FBF3 E421 9D48



Reply to: