[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: accented chars. shown as question marks in black diamonds in mozilla



On Mar 08 2007, Florian Kulzer wrote:
> On Thu, Mar 08, 2007 at 10:47:10 -0500, H.S. wrote:

> > I am not sure I understand this comment. I am not very familiar with 
> > encoding. I was assuming the web pages which have international 
> > characters are better off by using UTF-8 encoding.
> 
> What I meant was this: Your utf-8 setup (combined with using the proper
> fonts) is able to encode and display umlauts, accented characters,
> characters for Slavic languages, Scandinavian, Russian, Greek, (some)
> Asian characters, etc. This is in contrast to, say, someone using an
> iso-8859-1 locale who cannot display many of these "foreign" characters.
> (Unless s/he uses an application which can work around the limitations
> of the system's encoding, for example LaTeX.)
> 
> The problem is that a webpage has to tell your browser which encoding it
> uses to transmit the characters. If the browser has to guess things can
> go wrong. In your case iceape guessed the page was encoded in utf-8
> which goes wrong for many characters outside the standard us-ascii set.
> Once you told your browser that the page was in iso-8859-1 it could
> transcode properly. The root of the problem is that the character "??"
> (the accented e) exists in both utf-8 and iso-8859-1 but it has a
> different code in the two encodings.

Ok, dumb question time. I have hell's own mess with emails, basically
amounting to inability to read non-US characters in text emails, but I
was under the impression that there was a simple solution for web
pages. Html includes its _own_ encoding for accented, umlauted and
otherwise non-US characters, and conformant web pages are supposed to
use it - not rely on the lucky browser switching their browser
preferences from UTF-6 to ISO-988956-whatever to some-other-bloody-encoding
depending on the whim of the web page author. 

People reading this mail in html may have difficulty if I try to give
examples, but I type them into web page source all the time, to get
the non-US characters I want - and they work. Perhaps things are
different if the web page creator uses GUI-based "authoring" tools,
and can't tell that the tool is making stupid decisions under the good
;-)

Anyway, example time - á gives you an a with an acute acent.
(That's an ampersand symbol followed by the letters "aacute" followed
by a semi-colon.)

> > I was assuming they should have used UTF-8 along with the language tags 
> > around that word. I might be mistaken though.
> 
> This would maybe work if they would encode that word in utf-8. Since
> they decided to use iso-8859-1 throughout the document they could simply
> have included
> 
> <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
> 
> in the HTML header.

I see. Since I'm lazy - and unsure precisely what query to feed to a
search engine - could you possibly point at a list of these tags. 

-- 
Arlie

(Arlie Stephens	                              arlie@worldash.org)



Reply to: