[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re^18: Debian Metadata Proposal -- draft rev.1.4

Marco.Budde@hqsys.antar.com (Marco Budde) writes:
> Am 09.08.98 schrieb apharris # burrito.onshore.com ...
> APH> Identifier means the resource described by the metadata.  That's
> resource == document?

Yes.  Or news group, or mailing list, or URL, or whatever.

> 1.) This is not true if we use your proposal. Because it#s possible
>     to have one document in several formats. And this would mean
>     that one document will have several IDs.

Yes, several formats means several different metadata elements.  The
spec goes into this in great depth.

> 2.) There#s no real difference between both proposal if you look at the
>     IDs. In 99,9% there will be only one ID per File:.


> APH> simply the field's meaning.  I know as a database guy you look and
> APH> that and say, ah, this is the metadata-id, but it's just not so.
> I would say, it#s the metadata key :).

But it's not!

> APH> > to be unique! If they#re not unique, this could cause real problems.
> APH> > And for both databases I need unique and pseudo persitent IDs.
> APH> Ok, but you are conflating two things here.  (1) A unique way to refer
> APH> to the metadata entities.  I agree this is problematic now.  I'm
> APH> still, however, not convinced that this is entirely necessary for
> APH> metadata, but my implementation proving it's not necessary will tell
> Of course it#s necessary. I#m sorry, but I don#t understand why you like  
> your proposal. There#re several cases where your system will not work.  
> Maybe my proposal mixes metadata and resource IDs, but where#s the  
> problem? It solves a lot of problems.

Please name example cases.  I've outlined many cases where your system
will break.

> APH> Supposing we identify that we absolutely need unique metadata
> APH> identifiers, then fine.  First, it *wouldn't* be the Identifier tag.
> Of course it should be the ID tag, because the relation tag refers to the  
> ID and this is the problem with your proposal.

Not at all.  Relation tags can you any addressable resource, in any
addressing scheme understood by Identifier.  This formal parity
between Relation and Identifier is also lost in your variant.

> Once again we#re talking about a small file format to satisfy our needs. I  
> don#t know why we#re talking about such words like metadata, resource,  
> ids, etc. We simply need a solution for our needs.

Bullocks, it has to be robust and well designed.  It has to scale.  It
has to fit future, unanticipated needs.

Believe me.  I design and architect s/w for a living.  Restricting
yourself at the outset to "good enough" always fails.

> And you#re proposal doesn#t satisfy all needs.

Yes, it does solve the needs which are within it's scope. 

> APH> IDs.  Metadata who refer to local (file) resources which do not exist
> APH> would be summarily removed.
> If the local file doesn#t exist it#s a bug! Why should a package install a  
> metadata file which refers to not existent files?

It could happen if the local sysadmin removes extraneous
documentation, i.e., without removing the package.  It could happen if
hte proposed extention to dpkg occurs such that you can say 'dpkg -i
--dont-install-under=/usr/doc <deb>' such that /usr/doc isnt' even

See, short-sighted design.  And you want to make myopia standard.  How
very Microsoft of you.

> APH>   <docreg_file_path>:<Identifier>
> But this wouldn#t solve the Relation problems. My solution is much  
> simpler.

It doesn't try to solve the persistence of files on the local file
system.  I submit now, as I always have, that this perisistance
problem is *not* solvable within docreg itself.

> APH> This make a fairly robust assumption that one doesn't have non-unique
> APH> Identifier tags *within* *the* *same* *docreg* *file*.  Which could be
> APH> very easily enforced.
> No, this could not enforced. Do you new an example:

Sure; it could very *easily* be enforced.

>   Some URLs may change daily, because the filename include the release
>   date. In such a case you wouldn#t use the url with the filename but
>   only the directory as url.

??  Says who?  I would just ship a symlink with the pkg or something.
See, the solution is *external* to doc-base.

>   And maybe you would like to refer to more than one file in such
>   a directory.

Simply not allowed, nor needed.  Please put forth a case why I would
possibly want to do this.

> APH> Given that you can construct unique metadata-ids from this scheme, why
> APH> (remember, we're talking about metadata management here only) would we
> APH> ever need the scheme you proposed?
> Because my solution is a better design? I don#t understand why we should  
> use a filename as identifier. This is not part of the DC standard.

It *is* part of the standard to use standard URIs for the Identifier
and Relation.* elements.

> APH> > My proposal adds only one necessary tag, to identify the file. DC
> APH> > solved this problem by adding the DC information to the document
> APH> > itself. So I don#t see a big difference between my and the DC proposal.
> APH> This just isn't true.
> Why not? By the way who uses DC at the moment? In the WWW most people use  
> the HTML 3.2 metadata format. And in the WWW metadata are always part of  
> the document.

Yes, I want to support being able to do this:

  install-docs -i /usr/doc/jade/index.htm 

And if this file has metadata in it, we read it, and voila.  No need
for a docreg file at all.

> APH> Well, w.r.t. persistant resource identifiers, I don't think it's
> APH> robust to manage them within the docreg files.
> Why not? With my file format it#s possible to have real persitant IDs.

No it's not, you only have half-assed persistent IDs.  I explained in
antoher email plenty of cases where there is no facility, in your
scheme, and in docreg file in general, to truly maintain a persistent

> APH> > Why not? Ok you should avoid it, if there#re translated documents. But
> APH> > this is not a problem, because all documents are maintained by Debian.
> APH> This just isn't reality.  96% or so of the documents are *not*
> APH> maintained by debian.
> ??? We#re talking about a doc system for Debian. And all Debian doc  
> packages are maintained by Debian. So where#s the problem?

Maintainer ease of use.  Suppose DC catches on and lots of developers
start using it, either within HTML or SGML or whatever.  Why impose
additional burden on maintainer to maintain this.

Most of the documentation on a system is *not* debian specific.

> Example:
>  1.) You have installed doc-linux-de and your prefered language is
>      "en".
>      => dhelp shows the german description/title
>  2.) You install doc-linux-html.
>      => dhelp will show the English description/titles instead of
>         the German ones.
> I think that this is a really good solution.

Yes, it is, but I dont' need your varients to accomplish that.

> Which one would be the original for the relation link? How should I  
> display such entries? For example:
>   Identifier: www.debian.org
>   Title: Debian Homepage
>   Identifier: www.debian.org
>   Title: This is another title for the same URL
> Problem 1:
>   Identifier: www.de.debian.org
>   Title: Deutsche Homepage
>   Language: de
>   Relation: www.debian.org
> Problem 2:
>   Identifier: www.debian.org
>   Title: Deutsche Homepage
>   Language: de
>   Relation: www.debian.org

How about, instead:

Identifier: http://www.debian.org/
Title: Debian Homepage
Title: Debian Heimeseite (LANG=de)
Language: en de


But this is a very good edge case to bring up.  I'll have to think
about this one.

> APH> (Identifier, Relation.*) I need to be able to refer to resources
> APH> notwithstanding the local metadata entities that are installed.
> Sorry, but I don#t understand that. Why is it necessary for the relation  
> tag that we use the filename as identifier? I don#t see that.

No, it's necessary that these elements use URIs.  If you want to
establish new URI schemes, i.e., debian-doc:foobar, then hoorah.  But
you'll have to manage it outside of docreg.  Because it *can't* be
managed within docreg.  See the counter examples in other email.

> APH> I'm not against additional info, I just am against poor, non-robust
> APH> design.
> That#s right and using filenames as identifier or identifiers as filename  
> is a poor and non-robust design. The identifier shouldn#t change if you  
> move the docreg or the document. This is an error in the DC proposal.

No, I use URIs.  If they use a file: URI, then there's the risk for
filename non-persistence.  Same with http or whatever.  I'm for new
scheme, but I'm not for using *non* URIs in Relation.* and Identifier.
Because it's non-standard, non-robust, and non-maintainable.

> APH> I don't think I'm adding complexity; you are, by conflating
> APH> two types of entities and adding additional, unnecessary fields.
> I#m conflating nothing. You could use my file format in the same way as  
> your proposed file format. And a persitant ID is not unnecessary. For  
> books you have such an ID, too.

Well that's good.

> APH> (d) people will cut-n-paste docreg files and come up with duplicate
> APH> identifiers, stomping on one another
> That#s why you have to write a docregtest script for our developers.

Impossible to write unless the docregtest script has access to every
docreg file in debian, which is also impossible.

.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>

Reply to: