Re: locales and coding systems

To: debian-user@lists.debian.org
Subject: Re: locales and coding systems
From: Antonio Rodriguez <arodriguez31@cfl.rr.com>
Date: Sun, 4 Jan 2004 08:15:00 -0500
Message-id: <[🔎] 20040104131500.GB23753@hpd.the-sphere.org>
Mail-followup-to: Antonio Rodriguez <arodriguez31@cfl.rr.com>, debian-user@lists.debian.org
Reply-to: Antonio Rodriguez <arodriguez31@cfl.rr.com>
In-reply-to: <[🔎] 20040104124136.0DF48BE2@teufel.hartford-hwp.com>
References: <20031212092220.GF16285@riva.ucam.org> <20031212181917.183631CE@teufel.hartford-hwp.com> <20031212195730.GA25152@riva.ucam.org> <20031212215948.3F95B1CE@teufel.hartford-hwp.com> <20031213022746.GC452@doorstop.net> <20031213102547.94C3BA6B@teufel.hartford-hwp.com> <[🔎] pan.2004.01.03.18.05.00.81535@dutra.fastmail.fm> <[🔎] 20040104012219.5E5CE5C3@teufel.hartford-hwp.com> <[🔎] 1073205785.2229.3741.camel@dutras.dyndns.org> <[🔎] 20040104124136.0DF48BE2@teufel.hartford-hwp.com>

> Just a little context here. I'm running emacs 21.2.1. C-h C tells me
> my current default coding system is utf-8; my language environment is
> en_US.UTF-8. I can insert here in this message or into a blank file an
> extended character, such as c-cedilla: ç.

> In my problematic file, the extended characters appear as
> octals. Initially I tried to so a search/replace to convert the octals
> into proper characters, but emacs would not accept the octals as a
> search term. I could not search for the \347 and replace it with a
> c-cedilla because the \347 I pasted into the minibuffer was not really
> a \347 octal, but only looked like it. Since normally I can paste an
> octal as a search term, there's something about these octals that is
> not right. 
> 
> I first assumed that the coding sytem of the problmatic file was not
> being handled by emacs properly, and I sought a way to convert the
> file into useful form. I suspected maybe the file was somehow defined
> for a coding system that emacs did not undertand.
> 
> I tried two things. First, I tried to open the problematif file as
> utf-16-le (C-x RET c utf-16-le) and then save it as utf-8 (C-x RET f
> utf-8).  
> 
> Now, instead of octals, the extended chars in the utf-8 file appear
> instead as empty rectangles. So nothing gained, and perhaps
> information lost. However, there was another difference, perhaps more
> significant. In the original file that I suspected was utf-16-le, I
> could not insert a c-cedilla, which appeared as \347. However, when I
> saved the file using the utf-8 encoding system, I could now insert the
> c-cedilla properly. 
> 
> I did another experiment. Instead of saving the problematic file as
> utf-8, I saved it as iso-latin-1. This saved file still had the octal
> characters, and an inserted c-cedella still appeared as \347. In other
> words, saving the file as iso-latin-1 did nothing. Am I correct to
> infer that the original document was probably latin-1 and therefore
> the problem is not the document's coding system? 
> 
> How does one reveal file attributes beyond what is conveyed by ls -l?
> There's a lot more attributes than it displays. I perhaps should also
> display the file in hex-mode to see what the characters look like.
> 
> Haines Brown
> 

Adding some of my experience: I had locales us english utf8, when
writting a file using spanish characters, it was a whole poem, if I
would try to go back and fix some mistake, the whole text would get
all messed up: letters were moved, switched, the text would get
overwritten instead of inserted, etc. I would then save the file after
a lot of work fixing it to a "good condition", and after opening it
again with C-x-f I would see a bunch of garbage on the screen, with
\number, ?-marks etc instead of letters, an again with many words and
letters misplaced. It seems that now it is finally working fine after
I changed my locales to

tony@hpd:~$ locale
LANG=en_US.ISO-8859-15
LC_CTYPE="en_US.ISO-8859-15"
LC_NUMERIC="en_US.ISO-8859-15"
LC_TIME="en_US.ISO-8859-15"
LC_COLLATE="en_US.ISO-8859-15"
LC_MONETARY="en_US.ISO-8859-15"
LC_MESSAGES="en_US.ISO-8859-15"
LC_PAPER="en_US.ISO-8859-15"
LC_NAME="en_US.ISO-8859-15"
LC_ADDRESS="en_US.ISO-8859-15"
LC_TELEPHONE="en_US.ISO-8859-15"
LC_MEASUREMENT="en_US.ISO-8859-15"
LC_IDENTIFICATION="en_US.ISO-8859-15"
LC_ALL=
tony@hpd:~$

Before this I used to have also spanish es utf8 with euro.
I hope this helps.
AR

Reply to:

References:
- Re: locales and coding systems
  - From: Leandro Guimarães Faria Corsetti Dutra <leandro@dutra.fastmail.fm>
- Re: locales and coding systems
  - From: brownh@hartford-hwp.com (Haines Brown)
- Re: locales and coding systems
  - From: Leandro Guimarães Faria Corsetti Dutra <leandro@dutra.fastmail.fm>
- Re: locales and coding systems
  - From: brownh@hartford-hwp.com (Haines Brown)

Prev by Date: proper use of aptitude in stable/unstable mixed systems
Next by Date: .bashrc .bash_profile - created from?
Previous by thread: Re: locales and coding systems
Next by thread: Re: locales and coding systems
Index(es):
- Date
- Thread