[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A small question



Thank you for hints on where to get more information. :)  I will, though my
question was about the situation with Debian stuff.

Unicode character is fine if the output file is in Unicode.  Is the output of
debiandoc2* are in Unicode?  I believe the problem *you* encountered with
processing Russian translation has origins in fact that the output files are
not in Unicode.  For example, if we translate dselect-beginner.ru.sgml into
HTML format, we get a plain text file that has `Content-Type; text/html;
charset=koi8-r' at the very beginning.  All © in source file will appear
as 8-bit characters since we have

    <!ENTITY copy CDATA "&#169">

For all charset that have (C) symbol for code 169, the output will look fine.
Then, when you try to process the latex output from debiandoc2latex, you get a
lot of errors since in cyrillic font there is no symbol with code 169.

So the question is: what to do?

> > These are different definitions and while in the second case I could process
> > this SDATA [copy  ] for producing &copy; in HTML output and \copyright in TeX
> > output, I lack this possibility in first case.
> 
> Why do you say that?  As far as I am aware there are TeX packages that can handle Unicode.
The one we use for making the documentation from DebianDoc DTD is Unicode
aware?  And do we really supply it with Unicode file?

> Well, basically, the SDATA mappings are entirely arbitrary.
> Therefore, for the standard entity-sets which I have shipped with the
> sgml-data package, I use the Unicode entity mappings, which is handled
> fine by advanced browsers and the SGML tool-chain (nsgmls, jade, etc).
Does nsgmls have to be compiled in multi-byte mode for being Unicode aware or
not?  If yes, is it as of sp 1.3.3-1.2.1-7?

> I definately am willing to ship an alternate SDATA style entity sets
> for SGML (XML requires the Unicode ones).  I suppose either I could
> use a different FPI for that, or else I could even use SGML "marked
> sections" and a conditional parameter (i.e., use 'nsgmls
> -iuse-sdata-entities ...') to switch between whatever representation
> of entities you might want.  In either case, the default, IMHO, should
> be the Unicode representation.
Why?  I believe (I have not checked that yet) this should break sgml-tools
package (yes, yes sgml-tools v1).  It makes use of SDATA entities for
producing proper output.

Actually, I have only a practical aim in mind: to make Russian documents
correct.  So how to make &copy; look (C) in all versions of
dselect-beginer.ru? &smile;

--
Mike


Reply to: