Re: xterm, mutt, emacs -nw, and utf-8

To: debian-user@lists.debian.org
Subject: Re: xterm, mutt, emacs -nw, and utf-8
From: Vincent Lefevre <vincent@vinc17.org>
Date: Wed, 1 Nov 2006 12:59:24 +0100
Message-id: <[🔎] 20061101115924.GG11636@ay.vinc17.org>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20061101074526.GA31122@beo211.foi.se>
References: <20061030083301.GA10517@beo211.foi.se> <[🔎] 20061101074526.GA31122@beo211.foi.se>

On 2006-11-01 08:45:26 +0100, Anders Lennartsson wrote:
> On 2006-11-01 02:53:28 +0100, Vincent Lefevre wrote:
> 
> > See http://www.vinc17.org/mutt/ (there's a section on Emacs for Mutt).
> 
> Thanks. There was some information that provided a major step forward.
> Emacs started in -nw mode now understands my keyboard.

This was the goal of:

(when (not window-system)
  (set-keyboard-coding-system locale-coding-system)
  (set-terminal-coding-system locale-coding-system)
)

If users still can't enter 8-bit characters directly, then the
following line may be useful too:

(set-input-mode (car (current-input-mode)) (nth 1 (current-input-mode)) 1)

I have it in my .emacs, but don't know if it is still necessary,
at least under Debian.

> New messages now works nicely it seems. But replying to old messages
> with possibly different encodings seems to mess things up a bit. In
> some cases emacs does not start with the correct encoding, and seems
> to refuse to change.
> 
> And on a sarge box with no X11 whatsoever, but en_US.utf-8 as default
> and no user preferences, a ssh login does not provide acceptence for
> åäö in bash :( (which it did before in latin-1)

These may be two different problems. Concerning the ssh one, are you
sure that the locales are the same locally (the terminal locales)
*and* remotely? With recent versions of OpenSSH (3.9 and above?),
environment variables (such as the locales ones) can be passed to
the remote side. The /etc/ssh/sshd_config file on the SSH server
side should contain something like:

  AcceptEnv LANG LC_*

and on the local side, you should have something like:

  SendEnv LANG LC_*

(this can be done in you .ssh/config file). Of course, this won't
work with old OpenSSH versions. Otherwise you need to set locales
environment variables manually.

Now, concerning the Emacs problem, you should also make sure that
the locales are correct (i.e. consistent with the terminal). Have
you put

  (prefer-coding-system locale-coding-system)

in the find-file hook (to make sure that it is not overriden by
something else)?

If this works *in general*, but not in particular cases, the cause
may be incorrectly encoded files. Mutt does *not* make sure that
files it gives to the editor are correctly encoded. I had to wrote a
wrapper for the Mutt editor (it is given on my web page: mutteditor).
It first retrieves the local character set with:

if [ `uname` = "Darwin" ]; then
  charmap="`echo $LC_CTYPE | sed -n 's/.*\.//p'`"
  : ${charmap:=ANSI_X3.4-1968}
else
  charmap="`locale charmap`"
fi

(under Linux, you just need charmap="`locale charmap`"). Then I remove
all the non-printable characters and unknown sequences:

[ "`head -c5 \"$1\"`" = "From:" ] && \
  perl -i $HOME/bin/mkprintable "$charmap" "$1"

Then I execute the editor (emacs).

Note: the condition [ "`head -c5 \"$1\"`" = "From:" ] works only with
Mutt's $edit_headers variable set. I had to use that because mkprintable
shouldn't be called when editing an existing message. The mkprintable
script is:

#!/usr/bin/env perl

# Replace malformed data by U+FFFD or '?' (depending on the encoding)
# and remove the control characters from U+007F to U+009F. Useful for
# MUA's like Mutt, before calling the editor.

use strict;
use Encode;

my $RCSID = '$Id: mkprintable 5657 2004-12-18 13:57:07Z lefevre $';
my ($proc) = $RCSID =~ /^.Id: (\S+) / or die;

@ARGV or die "Usage: $proc <encoding>\n";
my $encoding = shift;

while (<>)
  {
    $_ = decode($encoding, $_, 'CHECK');
    tr/\x{7F}-\x{9F}//d;
    print encode($encoding, $_, 'CHECK');
  }

-- 
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

Reply to:

References:
- Re: xterm, mutt, emacs -nw, and utf-8
  - From: Anders Lennartsson <anders.lennartsson@foi.se>

Prev by Date: Re: emacs and mutt
Next by Date: Re: The sad demise of an etch.
Previous by thread: Re: xterm, mutt, emacs -nw, and utf-8
Next by thread: Re: xterm, mutt, emacs -nw, and utf-8
Index(es):
- Date
- Thread