utf8 Problems

To: debian-user@lists.debian.org
Subject: utf8 Problems
From: Bernhard Kuemel <bernhard@bksys.at>
Date: Sat, 28 Jul 2007 18:31:19 +0200
Message-id: <[🔎] 46AB6F57.6040004@bksys.at>

Hi debian-user!

I converted to utf8 in the hope that my non ASCII character problems
would disappear. They are now ... different.

I used utf8migrationtool and locale now says:

bernhard@b:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I am in Austria, where we speak German, but I chose en because the
German translations are often so ridiculous (in mc's config:
'verbose operation' gets 'redselige Vorgaenge', bash says 'getoetet'
instead of 'killed', when a process get's killed).

I chose US because I tought that was most used and thus most stable.

Now the problems:

I wanted to print a German text containing umlauts from a web page.
I marked it in iceweasel and pasted it into a 'konsole' running bash
running 'cat >x'. 'lpr x' printed only a page with the character 'K'.

'hexdump -C x' says:

00000010  20 20 20 20 20 20 4b fc  6e 64 69 67 75 6e 67 73  |
K.ndigungs|
00000020  62 65 73 63 68 72 e4 6e  6b 75 6e 67 65 6e 0a 0a
|beschr.nkungen..|

so &uuml; is 0xfc, &auml; is 0xf4, and the characters are printed as
periods '.'.

mc's viewer says:

00000010 20 20 20 20  20 20 4B FC  6E 64 69 67  75 6E 67 73
KÃ¼ndigungs
00000020 62 65 73 63  68 72 E4 6E  6B 75 6E 67  65 6E 0A 0A
beschrÃ¤nkungen..

Here &uuml; is still only the single byte 0xFC, but it gets printed
as 'A' with a tilde and a '1/4' character. &auml is again 0xE4 but
printed as 'A' with a tilde and a circle with 4 short lines
extending from the circle diagonally.

Opening x in openoffice writer shows rhombuses with question marks
for each umlaut.

Opening x.html in openoffice writer I was unable to remove all the
table etc. stuff and so was unable to reformat the text so it would
fit on one page. Hmm, it might work, if I copied the text from there
into a new document. But here I want to solve the locale problems,
or what should I call the problem?

mc (midnight commander, a norton commander clone) of course goes
crazy again, but I was not surprised and accepted that it prints 'a'
with '^' instead of line art, etc. More serious was that when I
'ssh'ed to a different computer (not sure which) it got confused
about which line it was on and I messed up editing /etc/fstab.

man gets quote characters wrong, printing 'a' with '^' instead and
so does gcc.

I also have problems with kvirc. IIRC I can get it to display
iso8859-1 correctly, but not utf8, and the smart utf8/iso8859-1 mode
does not work. I chat with users who use iso8859-1 and utf8.

Is there a package which is responsible for all these problems so I
can file a bug report against it? Or are these bugs in konsole, gcc,
man, bash, mc, iceweasel, openoffice and kvirc? Or ... is the bug
sitting in front of the computer again :)?

I wonder if it's easier to set up debian from scratch.

I'm basically running debian testing (since a long time) but because
I sometimes want packages from stable or unstable I have that in
sources.list, too (well, stable is commented out currently) and so I
don't upgrade to unstable I have this in /etc/apt/preferences:

----------------------------
Package: *
Pin: release a=stable
Pin-Priority: 650

Package: *
Pin: release a=testing
Pin-Priority: 700

Package: *
Pin: release a=unstable
Pin Priority: 600
----------------------------

Thanks, Bernhard

-- 
Please encrypt all emails
GPG Key (ID F732FBF3 E4219D48) available on public key servers
Fingerprint E18F BF4D 0EE2 6522 E950  A06A F732 FBF3 E421 9D48

Reply to:

Follow-Ups:
- Re: utf8 Problems
  - From: "Kelly Clowers" <kelly.clowers@gmail.com>
- Re: utf8 Problems
  - From: Florian Kulzer <florian.kulzer+debian@icfo.es>

Prev by Date: Re: Need newer software that included with stable (that isn't at backports.org)
Next by Date: Re: 'sensible-browser'
Previous by thread: Re: problems installing with 'writemaster' CDROM
Next by thread: Re: utf8 Problems
Index(es):
- Date
- Thread