On 31 Aug 2007, at 7:05 pm, Frank Küster wrote:
Hi Jonathan, here's a bug report we got in the Debian BTS about using inputenc with XeTeX. The full conversation is at http://bugs.debian.org/434056, butthe first paragraph cited describes the wish quite well. The problem isdescribed in more detail in the initial messages, but maybe you're familiar with it. What's your view on that? TIA for your answer,
Hi Frank,Yes, I'm familiar with the issue. I normally tell XeTeX users that they should not be using [utf8]{inputenc} at all, as the engine reads UTF-8 natively. I've sometimes thought that it would be good for the package to recognize when it is loaded under XeTeX, and automatically disable itself (perhaps with a warning), as this is a fairly common mistake for new users.
I seem to recall discussing this with one of the LaTeX team at a conference some time ago (maybe Chris? Morten?), but have not followed up on it recently.
A further step would be to also support other input encodings via the inputenc package. This would require changing the \XeTeXinputencoding setting to map the text to Unicode correctly. Then a legacy-encoded file that says
\usepackage[cp1250]{inputenc} or \usepackage[applemac]{inputenc}(or whatever) could work correctly with Unicode fonts in XeTeX. But the utf8 case is the common one, so it would be nice if at least that one worked transparently.
The correct place to address this issue is in the base LaTeX release; it's not a Debian (or other distro) bug. But in the absence of an upstream fix, you might want to try and come up with a patch -- I think it would be helpful to users.
JK
Frank Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> wrote:TeX, pdfTeX, Omega or XeTeX, he should be able to say \usepackage[utf8]{inputenc} and the right thing for the current implementation of TeX should magically happen.I suspect you can think so because you use a language in which there is little difference between utf8 and normal encoding, for CJK (, Arabic, Hindi, ?) I'm afraid things are not going so magically ;-)Ehm... no. XeTeX uses UTF-8 for input, and so does TeX (or e-TeX, orpdfTeX) with utf8.def. Legacy encodings are completely irrelevant forthis discussion. The point is that the four languages I use regularly are all covered by the small subset of Unicode that works correctly when you say \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc}I realise that people for whom that is the case are a minority (there'sprobably not much more than 1.2 billion of us in the world). However, just because it doesn't work for most people doesn't mean it should stop working for us.\usepackage{ifxetex} \ifxetex\else \usepackage[utf8]{inputenc} \fiYes, i'm currently doing roughly that (but with TeX primitives ratherthan the ifxetex package). However I believe that this should be handledautomatically by inputenc. Juliusz-- Frank Küster Debian Developer (teTeX/TeXLive)