[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: accented chars. shown as question marks in black diamonds in mozilla



On Thu, Mar 08, 2007 at 10:47:10 -0500, H.S. wrote:
> Florian Kulzer wrote:
> >On Thu, Mar 08, 2007 at 09:59:07 -0500, H.S. wrote:
> >>Hello,
> >>
> >>In Debian Etch Mozilla browser (Iceape), I notice that sometimes 
> >>accented characters are not displayed properly. They are shown as 
> >>question marks in black diamonds. For example, on this web page (CNN):
> >>http://www.time.com/time/nation/article/0,8599,1597226,00.html?cnn=yes
> >>
> >>I see this "or his prot�g�s". I assume the last word is protege with 
> >>accents on the e's. How do I find out what I am missing to have these 
> >>characters shown properly? Maybe a font? My default locale is 
> >>en_CA.UTF-8 and many of the international languages are shown properly. 
> >
> >Try to change to "View > Character Encoding > Western (ISO-8859-1)".
> 
> Yes, that worked.
> 
> >Your en_CA.UTF-8 would be able to display this page correctly if
> >time.com would bother to tell your browser that is uses ISO-8859-1.
> 
> I am not sure I understand this comment. I am not very familiar with 
> encoding. I was assuming the web pages which have international 
> characters are better off by using UTF-8 encoding.

What I meant was this: Your utf-8 setup (combined with using the proper
fonts) is able to encode and display umlauts, accented characters,
characters for Slavic languages, Scandinavian, Russian, Greek, (some)
Asian characters, etc. This is in contrast to, say, someone using an
iso-8859-1 locale who cannot display many of these "foreign" characters.
(Unless s/he uses an application which can work around the limitations
of the system's encoding, for example LaTeX.)

The problem is that a webpage has to tell your browser which encoding it
uses to transmit the characters. If the browser has to guess things can
go wrong. In your case iceape guessed the page was encoded in utf-8
which goes wrong for many characters outside the standard us-ascii set.
Once you told your browser that the page was in iso-8859-1 it could
transcode properly. The root of the problem is that the character "é"
(the accented e) exists in both utf-8 and iso-8859-1 but it has a
different code in the two encodings.

> I was assuming they should have used UTF-8 along with the language tags 
> around that word. I might be mistaken though.

This would maybe work if they would encode that word in utf-8. Since
they decided to use iso-8859-1 throughout the document they could simply
have included

<meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">

in the HTML header.

-- 
Regards,
          Florian



Reply to: