[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: console translator set without encoding

At 21 Jan 2005 18:58:41 -0800,
Thomas Bushnell BSG wrote:
> Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes:
> > Irregardless of what you think about it - the
> > western world doesn't need it (where ISO 8859-1 or 15 is enough).
> If only this were true.

Obviously I was exaggerating.

> You are certainly right that it's not a Hurd-specific question, and
> Debian Hurd should probably just track what other people do, but we
> should make sure all our parts do the right thing with UTF-8, and then
> be ready to switch when Debian is.

I think except for the input method we are completely up to par to an
UTF-8 xterm using an 8x13 font on Debian GNU/Linux these days (and
even 9x15 if you live with some limitations for extra-wide characters)
- and all this in VGA text mode (== fast scrolling) and up to 512
different characters at the same time (if you can live with 8 colors -
256 otherwise).

I have carried the VGA text mode to its most extreme, there is little
room for improvement in this context (you _could_ do overstrike by
handling the font slot allocation even more dynamically, but that's
about it).  For more, we need to go to graphics mode.

For the input method, marco was working on an xkb driver for the
console, which means we will support just about exactly the same input
methods as X Free 86 (plus/minus the obvious tweaks).  You can't get
much more complete than that either.

UTF-8 is an insanely complex standard, if you start to look down its
depths.  I am not sure if there is any complete implementation in the
world (well, maybe some proprietary solutions for tens of thousands of
dollar or so).  But we provide the reasonably subset that GNU/Linux
also supports.  Beyond that you will quickly hit all sorts of
limitations, including the fact that the unix console must be
mono-spaced(!), while UTF-8 knows something like double-width
characters which are twice as broad as normal characters (in
mono-spaced fonts).  Proper UTF-8 supports carries itself through all
layers of the system, from string comparison at the glibc and
filesystem level (when are two filenames distinct?) to input and
display methods at the application level.  It's huge.

Heck, we even support _real_ bold and _real_ italic font types on the
text console (unfortunately still not yet supported by emacs' font
lock mode, although that should be a small hack).  And we have a 6x3
cells big GNU Head (see console/motd.UTF-8)!  Find that anywhere else
in the world :)

That said, I have only done little testing with the UTF-8 stuff.  I
think it is solid, but there may be some error conditions with iconv
which mess things up.  This will naturally manifest itself once the
code gets used a lot.

Have fun,

Reply to: