[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: utf8 Problems



On 7/28/07, Bernhard Kuemel <bernhard@bksys.at> wrote:
> Hi debian-user!
>
> I converted to utf8 in the hope that my non ASCII character problems
> would disappear. They are now ... different.
>
> I used utf8migrationtool and locale now says:
>
> bernhard@b:~$ locale
> LANG=en_US.UTF-8

<snip>

> I wanted to print a German text containing umlauts from a web page.
> I marked it in iceweasel and pasted it into a 'konsole' running bash
> running 'cat >x'. 'lpr x' printed only a page with the character 'K'.
>
> 'hexdump -C x' says:
>
> 00000010  20 20 20 20 20 20 4b fc  6e 64 69 67 75 6e 67 73  |
> K.ndigungs|
> 00000020  62 65 73 63 68 72 e4 6e  6b 75 6e 67 65 6e 0a 0a
> |beschr.nkungen..|
>
> so &uuml; is 0xfc, &auml; is 0xf4, and the characters are printed as
> periods '.'.
>
> mc's viewer says:
>
> 00000010 20 20 20 20  20 20 4B FC  6E 64 69 67  75 6E 67 73
> Kündigungs
> 00000020 62 65 73 63  68 72 E4 6E  6B 75 6E 67  65 6E 0A 0A
> beschränkungen..
>
> Here &uuml; is still only the single byte 0xFC, but it gets printed
> as 'A' with a tilde and a '1/4' character. &auml is again 0xE4 but
> printed as 'A' with a tilde and a circle with 4 short lines
> extending from the circle diagonally.
>
> Opening x in openoffice writer shows rhombuses with question marks
> for each umlaut.
>
> Opening x.html in openoffice writer I was unable to remove all the
> table etc. stuff and so was unable to reformat the text so it would
> fit on one page. Hmm, it might work, if I copied the text from there
> into a new document. But here I want to solve the locale problems,
> or what should I call the problem?

I think this has to do with the use of HTML entities (&auml;)
instead of actual UTF-8 characters. An additional possible issue
is that the web page may not be UTF-8.

When I want to fix up an html page before printing it, I use a
WYSIWYG html editor (I use vim when writing my own html).
SeaMonkey Composer / Nvu / KompoZer (which are basically
all the same program in different forms) have worked well for
me.

> mc (midnight commander, a norton commander clone) of course goes
> crazy again, but I was not surprised and accepted that it prints 'a'
> with '^' instead of line art, etc. More serious was that when I
> 'ssh'ed to a different computer (not sure which) it got confused
> about which line it was on and I messed up editing /etc/fstab.
>
> man gets quote characters wrong, printing 'a' with '^' instead and
> so does gcc.
>
> I also have problems with kvirc. IIRC I can get it to display
> iso8859-1 correctly, but not utf8, and the smart utf8/iso8859-1 mode
> does not work. I chat with users who use iso8859-1 and utf8.

This sounds more like a real locale problem. Have you tried running
"dpkg-reconfigure locales"? That can fix some locale problems.


Cheers,
Kelly

Reply to: