Re: Re^16: Debian Metadata Proposal -- draft rev.1.4

To: Marco.Budde@hqsys.antar.com (Marco Budde)
Cc: debian-doc@lists.debian.org
Subject: Re: Re^16: Debian Metadata Proposal -- draft rev.1.4
From: apharris@burrito.onshore.com (Adam P. Harris)
Date: 09 Aug 1998 05:36:36 -0400
Message-id: <[🔎] oan29ebior.fsf@burrito.fake>
In-reply-to: Marco.Budde@hqsys.antar.com's message of "05 Aug 98 17:35:00 +0100"
References: <[🔎] b6b_9808071248@antares.antar.com>
Marco.Budde@hqsys.antar.com (Marco Budde) writes:
> APH> Then you shouldn't call it identifier because that's not what
> APH> 'Identifier' means.
> 
> Why not?

Identifier means the resource described by the metadata.  That's
simply the field's meaning.  I know as a database guy you look and
that and say, ah, this is the metadata-id, but it's just not so.

> APH> we're doing here.  Can you explain exactly why you think we need a
> APH> unique ID for metadata entities?
> 
> Again? One example: if you use a database to store the docreg informations  
> you have got a key and a data element. And the keys have to be unique! If  
> they#re not unique, this could cause real problems.

> And for both databases I need unique and pseudo persitent IDs.

Ok, but you are conflating two things here.  (1) A unique way to refer
to the metadata entities.  I agree this is problematic now.  I'm
still, however, not convinced that this is entirely necessary for
metadata, but my implementation proving it's not necessary will tell
me.  (2) Unique ways to identify resources (URNs, either officially or
some little quick hack scheme).

Supposing we identify that we absolutely need unique metadata
identifiers, then fine.  First, it *wouldn't* be the Identifier tag.
Second, it *wouldn't* be conflated with the need to identify unique
local documentation.

This is where your proposal is wrongheaded.  You conflate the two, and
you manage URNs in docreg files, both of which are flawed IMO.

> APH> > Right and that is bad. I#m working on a translation document, I have to
> APH> > ask the maintainer of the original document to release a new version
> APH> > with this debian-identifier.
> APH> ??  No you can just refer to the pkg/file where you got it from!  If
> APH> there's no "Debian-Identifier" then, clearly, you can't use it anyway.
> 
> That#s it! That#s why we#ve to force the maintainers to use a unique ID.

You are conflating the process of referring to metadata (i.e., so I can
create/update/delete it) and the process of referring to resources.

> APH> No, but identifiers point to files, not to metadata.  You don't seem
> 
> Maybe, but is that important? I don#t think so.

Very.  Managing local stores of metadata is completely different, both
functionally and logistically, to managing persistent document
identifiers (URIs).  I can't use a URI for managing metadata, since
pkg foo and pkg bar might both contain metadata for the same URN.
Blammo, non unique.

> APH> How do you intend it to be unique?  How can you enforce that?  Use the
> APH> package name?
> 
> The package name would be the standard solution. For important things like  
> the HOWTOs we could use "HOWTO-<lang>/<doc name>". Or we could give every  
> maintainer his own numbers (like the ISBN system).

Yuck... definately no, this is a poor idea.

> APH> But what if the package changes its name?  Or what if
> APH> the doc is split out into another package?
> 
> Ok, this could be problem. But we could solve it.

No, it's a flaw in your system.  I have always had the position that
maintainer metadata entities and maintaining resource ids are
completely different problems, with different solutions.  Shoehorning
them together is bad.  I proposed a spinoff method of managing URNs in
Debian, which is completely outside of the scope of doc-base as I see
it.

As for managing metadata-ids, I agree that methods for managing
metadata-ids have not been proposed by me.  But my position all along
has been that metadata entities are weak and don't even deserve global
IDs.  Metadata who refer to local (file) resources which do not exist
would be summarily removed.  The ability to reconstruct the database
anew out of existing docreg files has also always been a required
function, as you know, since you are the one who implemented this
function initially in the dhelp package.  So stale metadata entities
would be removed anyhow.

Given this weak object management, it's fairly trivial to construct a
unique metadata id, in a weak and wishy-washy but good enuf way, from
contextual information, i.e.,

  <docreg_file_path>:<Identifier>

This make a fairly robust assumption that one doesn't have non-unique
Identifier tags *within* *the* *same* *docreg* *file*.  Which could be
very easily enforced.

Given that you can construct unique metadata-ids from this scheme, why
(remember, we're talking about metadata management here only) would we
ever need the scheme you proposed?

> My proposal adds only one necessary tag, to identify the file. DC solved  
> this problem by adding the DC information to the document itself. So I  
> don#t see a big difference between my and the DC proposal.

This just isn't true.  For 85% of cases of peopole using DC, they are
not embedding the DC entities in the data itself.

> Using an identifier as filename 

Now you're talking about resource ids, not metadata ids, btw.

> is real a bad idea. I don#t understand,  
> why you like this idea. The old doc-base has got a id and a file name.

Well, w.r.t. persistant resource identifiers, I don't think it's
robust to manage them within the docreg files.

> APH>  * remove m-ids once the are created
> APH>  * rename m-ids
> 
> Why not? Ok you should avoid it, if there#re translated documents. But  
> this is not a problem, because all documents are maintained by Debian.

This just isn't reality.  96% or so of the documents are *not*
maintained by debian.  And again, you're ruling our URLs over and over
again.

> APH>  * refer to m-ids in packages that are not installed, i.e., on a Debian
> APH>    Documentation mirror (relevant to the Relation.* fields)
> 
> ??? I don#t understand that. If you install only the translated document,  
> a system like dhelp shows it as original. Where#s the problem and where#re  
> the differences?

Because it's *not* the original.  Why should the state of a resource
change depending on what is installed.  If a document is a
translation, it's a translation no matter what packages you have
installed locally.  Suddenly, a package is installed, and an
"original" document turns into a translation?!  Did I read that right?

> APH>  * enforce uniqueness on m-ids, and lack of enforced uniqueness is a bug
> APH> in    your scheme
> 
> That right and this is one advantages. And again, you proposal enforces  
> unique ids, too! If two docreg files add the same URL you have got  
> problems with the relations.

What problem?  Not at all!  Identifiers for resources are not supposed
to be unique metadata identifiers.  I always assumed that metadata
identifiers, assuming they are need at all, could be derived as I
discuss above.

> I see it and this is the problem of your proposal. You#re talking about  
> abstract definitions of words like ID and metadata. And I#m trying to  
> define a small and simple file format for our needs. I think it#s not  
> important, if some other people have got an other definition for metadata  
> or IDs.
>
> We#re talking about a solution for Debian. We#re *not* talking about  
> solutions for libraries, books, or the WWW.

I agree with this philosophy.  But I'm frustrated that you don't
understand the difference between metadata entities and resources.
They are distinct.  They are managed not in the same way but
differently.  In all tags which can carry *resource* identifiers
(Identifier, Relation.*) I need to be able to refer to resources
notwithstanding the local metadata entities that are installed.

> For example I don#t think that the DC standard itself is a really good  
> design. There#re several things, that should be improved. But of course we  
> could use the DC ideas. But why shouldn#t we add additional informations?

I'm not against additional info, I just am against poor, non-robust
design.  I don't think I'm adding complexity; you are, by conflating
two types of entities and adding additional, unnecessary fields.

> I don#t understand that. I#m proposing something like that:
> 
>  * every book has got a unique ID (called ISBN number)
>  * and it has got a local number, that tells the user, where he
>    can find the book (the filename in my proposal)
>
> Most libs. use something like my proposal for their books.

??  I've worked in libraries.  It's a perfect example.  Libraries
refer to book location by refering to an external subject catalog
numbering scheme.  I.e., dewey decimal numbers or the like.  Now I ask
you, is the dewey decimal system actually managed by the card catalog
files?  No, it's a "pre-existing" system as far as the card in the
catalog are concerned.  So are the files, or URLs, or whatever, in my
scheme.

Furthermore, managing the cards in the catalog, i.e., thowing them
out, replacing them, is *completely* distinct from managing the books
and managing the dewey decimal system itself.
 
> I don#t understand that. Why couldn#t you use http: etc. with my solution?  
> Where#s the difference? You define the URL in the Identifier field and I#m  
> using the File: field. So there#s no difference.

What is won if I make maintainers type:

  Identifier: gobbletygook
  URL: http://www.debian.org/

instead of my scheme:

  Identifier: http://www.debian.org/

So, I say that *you* are the one adding needless complexity.

Looking at your examples, and trying to think like a new user, I just
wonder why we need to have some sort of proprietary label that we
(Debian) applies to every possible URL under the sun.  If you are
using it to manage metadata, fair enough, but as I pointed out above,
not necessary.  If you are using it to give persistent, globally
unique names (pseudo-URNs) for resources, it's ill-conceived and not
going to be useful for squat.  The following situations *will* happen:

(a) people will change their identifiers, which will leave stranded
metadata in your database (suprise!), and will break any packages that
use those identifiers

(b) people will duplicate identifiers, because they will have to, i.e., 
  [From docreg file foobar.docreg]
    Identifier: gobbletygook
    URL: http://www.debian.org/
  [From docreg file feebar.docreg]
    Identifier: morenonsense
    URL: http://www.debian.org/
  etc....

(c) people will constantly ask why the hell they have to have this
arbitrary identifier, how should they name it, what does it mean?

(d) people will cut-n-paste docreg files and come up with duplicate
identifiers, stomping on one another

My point is that your proposal raises as many problems as it solves.

I guess I'll have to do an experimental dhelp type pkg of my own
proving that your scheme isn't required.... ;)

-- 
.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>
Reply to:
References:
- Re^16: Debian Metadata Proposal -- draft rev.1.4
  - From: Marco.Budde@hqsys.antar.com (Marco Budde)
Prev by Date: Re: dhelp 0.4.0
Next by Date: Re^2: dhelp 0.4.0
Previous by thread: Re^16: Debian Metadata Proposal -- draft rev.1.4
Next by thread: dhelp 0.4.0
Index(es):
- Date
- Thread