[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#434056: Inputenc and XeTeX don't work together



On 31 Aug 2007, at 7:05 pm, Frank Küster wrote:

Hi Jonathan,

here's a bug report we got in the Debian BTS about using inputenc with
XeTeX.  The full conversation is at http://bugs.debian.org/434056, but
the first paragraph cited describes the wish quite well. The problem is
described in more detail in the initial messages, but maybe you're
familiar with it.

What's your view on that? TIA for your answer,


Hi Frank,

Yes, I'm familiar with the issue. I normally tell XeTeX users that they should not be using [utf8]{inputenc} at all, as the engine reads UTF-8 natively. I've sometimes thought that it would be good for the package to recognize when it is loaded under XeTeX, and automatically disable itself (perhaps with a warning), as this is a fairly common mistake for new users.

I seem to recall discussing this with one of the LaTeX team at a conference some time ago (maybe Chris? Morten?), but have not followed up on it recently.

A further step would be to also support other input encodings via the inputenc package. This would require changing the \XeTeXinputencoding setting to map the text to Unicode correctly. Then a legacy-encoded file that says
  \usepackage[cp1250]{inputenc}
or
  \usepackage[applemac]{inputenc}
(or whatever) could work correctly with Unicode fonts in XeTeX. But the utf8 case is the common one, so it would be nice if at least that one worked transparently.

The correct place to address this issue is in the base LaTeX release; it's not a Debian (or other distro) bug. But in the absence of an upstream fix, you might want to try and come up with a patch -- I think it would be helpful to users.

JK


Frank

Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> wrote:

TeX, pdfTeX, Omega or XeTeX, he should be able to say

  \usepackage[utf8]{inputenc}

and the right thing for the current implementation of TeX should
magically happen.

I suspect you can think so because you use a language
in which there is little difference between utf8 and
normal encoding, for CJK (, Arabic, Hindi, ?) I'm afraid
things are not going so magically ;-)

Ehm... no.  XeTeX uses UTF-8 for input, and so does TeX (or e-TeX, or
pdfTeX) with utf8.def. Legacy encodings are completely irrelevant for
this discussion.

The point is that the four languages I use regularly are all covered
by the small subset of Unicode that works correctly when you say

  \usepackage[utf8]{inputenc}
  \usepackage[T1]{fontenc}

I realise that people for whom that is the case are a minority (there's
probably not much more than 1.2 billion of us in the world). However,
just because it doesn't work for most people doesn't mean it should
stop working for us.

\usepackage{ifxetex}
\ifxetex\else
\usepackage[utf8]{inputenc}
\fi

Yes, i'm currently doing roughly that (but with TeX primitives rather
than the ifxetex package). However I believe that this should be handled
automatically by inputenc.

                                        Juliusz

--
Frank Küster
Debian Developer (teTeX/TeXLive)




Reply to: