[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How can I get the Euro symbol?



On Tue, Jan 01, 2002 at 08:30:53PM +0000, Phillip Deackes wrote:
| On Mon, 31 Dec 2001 12:48:37 -0600
| Colin Watson <cjwatson@debian.org> wrote:
| 
| > It might be worth having a look at the Euro-Char-Support mini-HOWTO
| > (/usr/share/doc/HOWTO/en-html/mini/Euro-Char-Support/index.html if you
| > have doc-linux-html installed, or somewhere on
| > http://www.linuxdoc.org/). I'm not sure if it's good enough either - it
| > was put together quite recently - but it does at least have the virtue
| > of being concise.
| 
| Thanks for your help, Colin and Sean.
| 
| The document you mention is the problem. It does not help me at all. For
| instance, it says that automatic configuration is possible with the
| 'lang-env' package. I cannot find any package with a name like 'lang-env'.
| A search of Debian packages yields nothing.

It is 'language-env'.  According to 'apt-cache policy' I don't think
it is in potato.  (potato is _really_ _really_ old)

| Furthermore, paragraphs such as this are pretty incomprehensible to me:
| 
| "Programs use the localisation environment in order to know both the
| language and the charset being used. Currently there is no separation,
| unless you are using UTF-8 from locale and representation. Environment
| locales use both the language for example:
| 
| es_ES.ISO-8859-1
| en_US.utf"

I believe this line is wrong, but I don't know for sure.  I'm using
"en_US.UTF-8".

| What is .utf?

Universal Transformation Format

It is a fancy name for describing a method to store multibyte
characters (unicode) in a file (where bytes are the only data type).

| Why are there certain files containing 'euro' but not others.

Why are there certain files containing "g" but not others?  The euro
is just another character.

| Why might I be inetersted in installing a Spanish language file.

If you use spanish.

| I did enable the French Euro file, since I speak French, but this
| does not appear to help. Do I need to enable it?

What is the "French Euro file"? 

| I see no need to understand localisation issues. I want to be able to
| choose my language/keyboard and do little more.

Choosing your language _is_ localisation!

| I appreciate that adding the Euro symbol is not as simple as it
| sounds, but somebody who knows how to do it should be able to write
| a step-by-step crib sheet so that other can get it working on their
| systems.

As I mentioned above, the euro is just another character.
Unfortunately, people need/want more than the 127 characters in the
US-ASCII character set (aka charset).  The euro, for example, is not
part of US-ASCII.  Since there are 127 additional values not taken by
US-ASCII, some ISO committee(s) have created additional charsets to
add some characters.  These charsets are supersets of US-ASCII (that
is, the first 127 characters are identical to US-ASCII) and the
remaining 127 characters are characters useful to a given region
(locale).  ISO-8859-1 contains many umlaut characters that are common
in Western European languages.  If you set your locale to ISO8859-1
then you can store those umlauts in plain text files and share them
with other people who are also using ISO8859-1.  (think of a charset
as a text/file format.  jpg and png both store images, but in
different formats)  Likewise ISO8859-2 has characters that are found
in Eastern European languages.

The advantage to these encodings is that they are all single-byte
(char == 8 bits == 1 byte).  This means that existing programs can
deal with them more-or-less reasonably, even if they don't understand
the locale.  The problem is that if you deal with multiple languages
(eg, French and Romanian) on a regular basis, not only is it a PITA to
keep adjusting settings, but you can't put charcters from both
encodings into the same file.  Thus Unicode was developed.  It is a
16-bit (I think it is really 32-bit, but only the lower 16 bits have
characters specified) character set that can represent the alphabet of
most languages simultaneously.  Unicode presents a problem though --
each unicode character requires at least 2 bytes in memory, but the C
'char' type is only guaranteed to be 1 byte.  In addition, it is not
wholly backwards compatible with US-ASCII.  The problem is that
applications must be developed with this in mind so that they can
handle it properly.  Various encodings of Unicode have been developed
to store unicode characters in files.  UTF-8 is the most well-known,
and it is backwards compatible with US-ASCII (for the US-ASCII subset
of Unicode).  Thus if you use UTF-8 and stick to just the US-ASCII
subset where the additional characters are not needed or not
understood there is no problem.

Now, how does all this relate to the euro?  Well, the euro is not part
of the US-ASCII charset.  Nor is it part of ISO8859-1.  However it is
part of Unicode (character 0x20AC).  To make use of the euro you must
use the Unicode charset and choose one of its encodings (UTF-8) for
storing files.  Now read the Unicode HOWTO for some more information
on Unicode.  That HOWTO has some information, and some of it is dated,
but it helps to understand what must be configured where and what
doesn't work in using characters that aren't part of ISO-8859-1.

To try and put it simply : you need to

    o   install the X fonts to display Unicode characters
        (unfortunately GTK+ 1.2 doesn't handle multibyte fonts
        correctly so most GTK+ apps won't handle unicode correctly,
        gvim is an exception)

    o   configure the console to display unicode (if you use the
        console, I don't know how to configure it yet)

    o   specify the encoding in your locale, eg :
            export LANG=en_US.UTF-8

    o   test the programs that you use and see which ones work with
        unicode and how to work with it

        don't put the euro in a file that will be read by a program
        that doesn't understand unicode,

        in vim you can create mappings such that when you enter a
        certain set of characters it inserts something else or you can
        enter any unicode character with ^VuXXXX where ^V is control-v
        and XXXX is the character's value (in hexadecimal)

The euro character itself is not special any more than all other
non-US-ASCII characters are.  The problem is a bigger one --
developers have long followed the tradition that a char is one byte
and follows the ASCII encoding.  Support for unicode has lagged
considerably and it (any major change like that) causes many problems
with program interaction (if one program supports unicode while the
other doesn't or only partially supports it).

-D

-- 

He who walks with the wise grows wise,
but a companion of fools suffers harm.
        Proverbs 13:20



Reply to: