[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A small question



Michael Sobolev <mss@transas.com> writes:

> I've got a small question: where all these entities come from? :)

W3O, mostly.  See the copyright file in the sgml-data package.

> These make me think that it does not matter whether //HTML suffix is there the
> entities are the same.
[...]
> Aha, at least, this makes me think that these two files are different!  They
> are defining different sets of entities.  BUT, according to
> /usr/lib/sgml/catalog file, the first set of entities can be also referred to
> as to "...//EN".

> So here is my question, how I should treat all this ifnormation?

With caution.  It is possible that I have screwed up and marked as
non-HTML specific what really *is* HTML specific.  Note that the
docbook-xml package contains XML versions of this stuff (XML encodes
entities a little differently .. I think it's implicitly CDATA).

> My main concern (well, it's where this investigatation started from) is entity
> named copy.  If I look into first file I see
> 
>     <!ENTITY copy CDATA "&#169;">

This is a Unicode character definition.

> I see no definition for copy in the second file, while iso-.../ISOnum file
> defines:
> 
>     <!ENTITY copy   SDATA "[copy  ]"--=copyright sign-->

>From <URL:http://www.oasis-open.org/cover/isoEntsExplained.html>, 

| They are "SDATA" entity sets, which means that it is the job of the
| recipient to map them to something locally useful.

> These are different definitions and while in the second case I could process
> this SDATA [copy  ] for producing &copy; in HTML output and \copyright in TeX
> output, I lack this possibility in first case.

Why do you say that?  As far as I am aware there are TeX packages that can handle Unicode.

> Please comment.

Well, basically, the SDATA mappings are entirely arbitrary.
Therefore, for the standard entity-sets which I have shipped with the
sgml-data package, I use the Unicode entity mappings, which is handled
fine by advanced browsers and the SGML tool-chain (nsgmls, jade, etc).

I definately am willing to ship an alternate SDATA style entity sets
for SGML (XML requires the Unicode ones).  I suppose either I could
use a different FPI for that, or else I could even use SGML "marked
sections" and a conditional parameter (i.e., use 'nsgmls
-iuse-sdata-entities ...') to switch between whatever representation
of entities you might want.  In either case, the default, IMHO, should
be the Unicode representation.

I *guess* I prefer the former option (use alternate FPIs) becuase it
seems like we could do it a bit at a time....

For more info read
<URL:http://www.oasis-open.org/cover/topics.html#entities>.

--
.....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/>


Reply to: