[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: First beta version of the Debian SGML/XML HOWTO



Stephane Bortzmeyer wrote:
> 
> > Stephané:
> 
> Hello, UTF-8 :-)

Yes... It works OK when I bounce it thru some other mail relays so I
think the debian list manager is doing bad things to UTF-8...

> 
> > The dtd could not be found until I modified your example to provide the
> > full path.
> 
> I'll check that. I don't work a lot on potato and I don't have xemacs, but
> I'll check. If someone knowledgeable can give me a summary about SYSTEM
> identifiers in XML... Apparently, they are required :-( but are sometimes
> absolute filenames, sometimes relative filenames and sometimes URL, the later
> not being supported by most SGML tools.

Looking at the spec yields the following:

<spec>
  [28]doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('['
(markupdecl
                                 | PEReference | S)* ']' S?)? '>'
</spec>

Where 'S' stands for one or more space characters.

<spec>
  [75]ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral
S SystemLiteral 
</spec>

So SystemLiteral is required if PUBLIC is used.

<spec>
  The SystemLiteral is called the entity's system identifier. It is a
  URI, which may be used to retrieve the entity. Note that the hash mark
  (#) and fragment identifier frequently used with URIs are not,
  formally, part of the URI itself; an XML processor may signal an error
  if a fragment identifier is given as part of a system
  identifier. Unless otherwise provided by information outside the scope
  of this specification (e.g. a special XML element type defined by a
  particular DTD, or a processing instruction defined by a particular
  application specification), relative URIs are relative to the location
  of the resource within which the entity declaration occurs. A URI
  might thus be relative to the document entity, to the entity
  containing the external DTD subset, or to some other external
  parameter entity.
</spec>

<spec>
  In addition to a system identifier, an external identifier may
  include a public identifier. An XML processor attempting to retrieve
  the entity's content may use the public identifier to try to generate
  an alternative URI. If the processor is unable to do so, it must use
  the URI specified in the system literal. Before a match is attempted,
  all strings of white space in the public identifier must be normalized
  to single space characters (#x20), and leading and trailing white
  space must be removed.
</spec>

So we may use the pub id first then if that fails use the sys id.

***

I'd suggest that we try to use pub id's wherever possible - which means
that tools will have to be able to find the catalog - and I suggest we
use relative file URI's for sys id's.

For example, on my potato system, nsgmls has compiled in (apparently)
that it should look in '.',  '/usr/local/share/sgml/',
'/usr/local/lib/sgml/', and '/usr/lib/sgml/'. Slink is a little
different.

Thus the following works:

Given an XML document mydoc.xml that starts with:

	<?xml version="1.0" encoding="ISO-8859-1"?>
	<!DOCTYPE sources SYSTEM "ucinput.dtd">
	...

Where /usr/local/share/sgml/ucinput.dtd exists,

	nsgmls declaration/xml.decl mydoc.xml

Works fine - note that both the decl and the dtd are relative.

I can't seem to actually get nsgmls or psgml to use the catalog tho... I
haven't checked other tools... 

IF the tools follow the 'allowed' logic in the spec, one should be able
to put in a bogus filename and have the catalog lookup take precedence
if the catalog and an entry in it are found.

ml


Reply to: