[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: locales and coding systems



> Em Sáb, 2004-01-03 às 23:22, Haines Brown escreveu:
> > I
> > do have a few files that emacs has trouble with, probably 16-bit, but
> > they are exceptional, and I know how to handle utf-16 in emacs and
> > convert those files to useful form. I've just not had the time to play
> > with the one difficult file now troubling me.
> 
> 	To convert the encoding of a file, open it and C-x RETURN f, is that
> what you're using?

Just a little context here. I'm running emacs 21.2.1. C-h C tells me
my current default coding system is utf-8; my language environment is
en_US.UTF-8. I can insert here in this message or into a blank file an
extended character, such as c-cedilla: ç.

OK, that seems to mean that emacs is working properly, and my problem
has to do instead with a problematic file. This file is a plain text
message, but it is two years old, and who knows what I may have done
to it?

In my problematic file, the extended characters appear as
octals. Initially I tried to so a search/replace to convert the octals
into proper characters, but emacs would not accept the octals as a
search term. I could not search for the \347 and replace it with a
c-cedilla because the \347 I pasted into the minibuffer was not really
a \347 octal, but only looked like it. Since normally I can paste an
octal as a search term, there's something about these octals that is
not right. 

I first assumed that the coding sytem of the problmatic file was not
being handled by emacs properly, and I sought a way to convert the
file into useful form. I suspected maybe the file was somehow defined
for a coding system that emacs did not undertand.

I tried two things. First, I tried to open the problematif file as
utf-16-le (C-x RET c utf-16-le) and then save it as utf-8 (C-x RET f
utf-8).  

Now, instead of octals, the extended chars in the utf-8 file appear
instead as empty rectangles. So nothing gained, and perhaps
information lost. However, there was another difference, perhaps more
significant. In the original file that I suspected was utf-16-le, I
could not insert a c-cedilla, which appeared as \347. However, when I
saved the file using the utf-8 encoding system, I could now insert the
c-cedilla properly. 

I did another experiment. Instead of saving the problematic file as
utf-8, I saved it as iso-latin-1. This saved file still had the octal
characters, and an inserted c-cedella still appeared as \347. In other
words, saving the file as iso-latin-1 did nothing. Am I correct to
infer that the original document was probably latin-1 and therefore
the problem is not the document's coding system? 

How does one reveal file attributes beyond what is conveyed by ls -l?
There's a lot more attributes than it displays. I perhaps should also
display the file in hex-mode to see what the characters look like.

Haines Brown



Reply to: