[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: locales and coding systems



> Just a little context here. I'm running emacs 21.2.1. C-h C tells me
> my current default coding system is utf-8; my language environment is
> en_US.UTF-8. I can insert here in this message or into a blank file an
> extended character, such as c-cedilla: ç.

> In my problematic file, the extended characters appear as
> octals. Initially I tried to so a search/replace to convert the octals
> into proper characters, but emacs would not accept the octals as a
> search term. I could not search for the \347 and replace it with a
> c-cedilla because the \347 I pasted into the minibuffer was not really
> a \347 octal, but only looked like it. Since normally I can paste an
> octal as a search term, there's something about these octals that is
> not right. 
> 
> I first assumed that the coding sytem of the problmatic file was not
> being handled by emacs properly, and I sought a way to convert the
> file into useful form. I suspected maybe the file was somehow defined
> for a coding system that emacs did not undertand.
> 
> I tried two things. First, I tried to open the problematif file as
> utf-16-le (C-x RET c utf-16-le) and then save it as utf-8 (C-x RET f
> utf-8).  
> 
> Now, instead of octals, the extended chars in the utf-8 file appear
> instead as empty rectangles. So nothing gained, and perhaps
> information lost. However, there was another difference, perhaps more
> significant. In the original file that I suspected was utf-16-le, I
> could not insert a c-cedilla, which appeared as \347. However, when I
> saved the file using the utf-8 encoding system, I could now insert the
> c-cedilla properly. 
> 
> I did another experiment. Instead of saving the problematic file as
> utf-8, I saved it as iso-latin-1. This saved file still had the octal
> characters, and an inserted c-cedella still appeared as \347. In other
> words, saving the file as iso-latin-1 did nothing. Am I correct to
> infer that the original document was probably latin-1 and therefore
> the problem is not the document's coding system? 
> 
> How does one reveal file attributes beyond what is conveyed by ls -l?
> There's a lot more attributes than it displays. I perhaps should also
> display the file in hex-mode to see what the characters look like.
> 
> Haines Brown
> 

Adding some of my experience: I had locales us english utf8, when
writting a file using spanish characters, it was a whole poem, if I
would try to go back and fix some mistake, the whole text would get
all messed up: letters were moved, switched, the text would get
overwritten instead of inserted, etc. I would then save the file after
a lot of work fixing it to a "good condition", and after opening it
again with C-x-f I would see a bunch of garbage on the screen, with
\number, ?-marks etc instead of letters, an again with many words and
letters misplaced. It seems that now it is finally working fine after
I changed my locales to

tony@hpd:~$ locale
LANG=en_US.ISO-8859-15
LC_CTYPE="en_US.ISO-8859-15"
LC_NUMERIC="en_US.ISO-8859-15"
LC_TIME="en_US.ISO-8859-15"
LC_COLLATE="en_US.ISO-8859-15"
LC_MONETARY="en_US.ISO-8859-15"
LC_MESSAGES="en_US.ISO-8859-15"
LC_PAPER="en_US.ISO-8859-15"
LC_NAME="en_US.ISO-8859-15"
LC_ADDRESS="en_US.ISO-8859-15"
LC_TELEPHONE="en_US.ISO-8859-15"
LC_MEASUREMENT="en_US.ISO-8859-15"
LC_IDENTIFICATION="en_US.ISO-8859-15"
LC_ALL=
tony@hpd:~$

Before this I used to have also spanish es utf8 with euro.
I hope this helps.
AR



Reply to: