Re: A small question
Michael Sobolev <mss@transas.com> writes:
> I've got a small question: where all these entities come from? :)
W3O, mostly. See the copyright file in the sgml-data package.
> These make me think that it does not matter whether //HTML suffix is there the
> entities are the same.
[...]
> Aha, at least, this makes me think that these two files are different! They
> are defining different sets of entities. BUT, according to
> /usr/lib/sgml/catalog file, the first set of entities can be also referred to
> as to "...//EN".
> So here is my question, how I should treat all this ifnormation?
With caution. It is possible that I have screwed up and marked as
non-HTML specific what really *is* HTML specific. Note that the
docbook-xml package contains XML versions of this stuff (XML encodes
entities a little differently .. I think it's implicitly CDATA).
> My main concern (well, it's where this investigatation started from) is entity
> named copy. If I look into first file I see
>
> <!ENTITY copy CDATA "©">
This is a Unicode character definition.
> I see no definition for copy in the second file, while iso-.../ISOnum file
> defines:
>
> <!ENTITY copy SDATA "[copy ]"--=copyright sign-->
>From <URL:http://www.oasis-open.org/cover/isoEntsExplained.html>,
| They are "SDATA" entity sets, which means that it is the job of the
| recipient to map them to something locally useful.
> These are different definitions and while in the second case I could process
> this SDATA [copy ] for producing © in HTML output and \copyright in TeX
> output, I lack this possibility in first case.
Why do you say that? As far as I am aware there are TeX packages that can handle Unicode.
> Please comment.
Well, basically, the SDATA mappings are entirely arbitrary.
Therefore, for the standard entity-sets which I have shipped with the
sgml-data package, I use the Unicode entity mappings, which is handled
fine by advanced browsers and the SGML tool-chain (nsgmls, jade, etc).
I definately am willing to ship an alternate SDATA style entity sets
for SGML (XML requires the Unicode ones). I suppose either I could
use a different FPI for that, or else I could even use SGML "marked
sections" and a conditional parameter (i.e., use 'nsgmls
-iuse-sdata-entities ...') to switch between whatever representation
of entities you might want. In either case, the default, IMHO, should
be the Unicode representation.
I *guess* I prefer the former option (use alternate FPIs) becuase it
seems like we could do it a bit at a time....
For more info read
<URL:http://www.oasis-open.org/cover/topics.html#entities>.
--
.....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/>
Reply to: