[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Απ: accented chars. shown as question marks in black diamonds in mozilla



2007/3/8, H.S. <hs.samix@gmail.com>:
Florian Kulzer wrote:
> On Thu, Mar 08, 2007 at 09:59:07 -0500, H.S. wrote:
>> ...For example, on this web page (CNN):
>> http://www.time.com/time/nation/article/0,8599,1597226,00.html?cnn=yes
>> I see this "or his prot�g�s". I assume the last word is protege with
>> ...
>
> Try to change to "View > Character Encoding > Western (ISO-8859-1)".

Yes, that worked.


<disclaimer>ROUGH EXPLANATIONS</>

when one writes a text in a text-editor the text-editor must store it
in the disk as a series of numbers (for example ABC will become
65,66,67)
  this is called encoding the text
when your browser renders that text in the screen it must convert the
series of numbers to glyphs of letters (for example 65,66,67 will be
presented as ABC)
   this is called decoding

in order for this to work the two programs (text editor and browser)
should agree in order to use the same rules of conversion (for example
A<->65, B<->66,...)

this is where everything gets messed up because there are more than
one possible encoding rules and web server, a database server, a lot
of programmers and sysadmins and heaven knows what else in between the
two programs. You the user then, must try a few possible encoding and
see what works. Not too difficult just use the view->encoding menu.
Still it is annoying

in the case of this page the text is really encoded as iso8859-1 (as
you can find out if you manually select this encoding when everything
displays properly) but the html code reports that it's text is encoded
as UTF-8 (as you can see if you look at the first lines of the html
source: content="text/html; charset=utf-8" - you can see the source
with menu->view->page source).

So its a problem that only time.com can solve properly

> Your en_CA.UTF-8 would be able to display this page correctly if
> time.com would bother to tell your browser that is uses ISO-8859-1.

I am not sure I understand this comment. I am not very familiar with
encoding. I was assuming the web pages which have international
characters are better off by using UTF-8 encoding.

all these things I told you regarding character encodings don't aply
only to the case of a text-editor producing text to be displayed in a
web browser. In fact they aply when ever a computer stores and
displays text. Text stored in memory/disk/wherever must be encoded.
Text retrieved to be displayed must be decoded. And this is where your
default locale comes to play its part:

My default locale is en_CA.UTF-8 and many of the
international languages are shown properly.

this (UTF-8) is the encoding YOUR pc uses to store/display characters.
When not told to use any other encoding it uses UTF-8. When told that
a text is encoded differently it is silently converting it to UTF-8 to
handle it internally. That is good because UTF-8 is a good encoding
scheme by measure of how many different languages it can handle
(almost all). If for example your default encoding was iso-8859-1 you
would never be able to see how a Greek or Japanese text would look
like[1]
So you did your part right. Your computer IS ABLE to display most
texts right if they are properly tagged regarding what encoding they
use.

[1] of course you need also have fonts with Greek / Japanese letters

Reply to: