[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re^18: Debian Metadata Proposal -- draft rev.1.4

Am 09.08.98 schrieb apharris # burrito.onshore.com ...

Moin Adam!

APH> Identifier means the resource described by the metadata.  That's

resource == document?

1.) This is not true if we use your proposal. Because it#s possible
    to have one document in several formats. And this would mean
    that one document will have several IDs.

2.) There#s no real difference between both proposal if you look at the
    IDs. In 99,9% there will be only one ID per File:.

APH> simply the field's meaning.  I know as a database guy you look and
APH> that and say, ah, this is the metadata-id, but it's just not so.

I would say, it#s the metadata key :).

APH> > to be unique! If they#re not unique, this could cause real problems.
APH> > And for both databases I need unique and pseudo persitent IDs.
APH> Ok, but you are conflating two things here.  (1) A unique way to refer
APH> to the metadata entities.  I agree this is problematic now.  I'm
APH> still, however, not convinced that this is entirely necessary for
APH> metadata, but my implementation proving it's not necessary will tell

Of course it#s necessary. I#m sorry, but I don#t understand why you like  
your proposal. There#re several cases where your system will not work.  
Maybe my proposal mixes metadata and resource IDs, but where#s the  
problem? It solves a lot of problems.

APH> Supposing we identify that we absolutely need unique metadata
APH> identifiers, then fine.  First, it *wouldn't* be the Identifier tag.

Of course it should be the ID tag, because the relation tag refers to the  
ID and this is the problem with your proposal.

APH> This is where your proposal is wrongheaded.  You conflate the two, and
APH> you manage URNs in docreg files, both of which are flawed IMO.

I don#t understand that. Your#re using the Identifier as ID and URL. And  
I#m proposing to use the Identifier as ID and File as URL.

APH> completely different problems, with different solutions.  Shoehorning
APH> them together is bad.  I proposed a spinoff method of managing URNs in
APH> Debian, which is completely outside of the scope of doc-base as I see
APH> it.

Once again we#re talking about a small file format to satisfy our needs. I  
don#t know why we#re talking about such words like metadata, resource,  
ids, etc. We simply need a solution for our needs.

And you#re proposal doesn#t satisfy all needs.

APH> IDs.  Metadata who refer to local (file) resources which do not exist
APH> would be summarily removed.

If the local file doesn#t exist it#s a bug! Why should a package install a  
metadata file which refers to not existent files?

APH>   <docreg_file_path>:<Identifier>

But this wouldn#t solve the Relation problems. My solution is much  

APH> This make a fairly robust assumption that one doesn't have non-unique
APH> Identifier tags *within* *the* *same* *docreg* *file*.  Which could be
APH> very easily enforced.

No, this could not enforced. Do you new an example:

  Some URLs may change daily, because the filename include the release
  date. In such a case you wouldn#t use the url with the filename but
  only the directory as url.

  And maybe you would like to refer to more than one file in such
  a directory.

APH> Given that you can construct unique metadata-ids from this scheme, why
APH> (remember, we're talking about metadata management here only) would we
APH> ever need the scheme you proposed?

Because my solution is a better design? I don#t understand why we should  
use a filename as identifier. This is not part of the DC standard.

APH> > My proposal adds only one necessary tag, to identify the file. DC
APH> > solved this problem by adding the DC information to the document
APH> > itself. So I don#t see a big difference between my and the DC proposal.
APH> This just isn't true.

Why not? By the way who uses DC at the moment? In the WWW most people use  
the HTML 3.2 metadata format. And in the WWW metadata are always part of  
the document.

APH>  For 85% of cases of peopole using DC, they are
APH> not embedding the DC entities in the data itself.

Example please. Books and HTML pages include it.

APH> > Using an identifier as filename
APH> Now you're talking about resource ids, not metadata ids, btw.

I was always talking about resource IDs. But of course you could use the  
Identifier of my proposal as metadata ID, too. Because this is necessary  
in some cases.

APH> Well, w.r.t. persistant resource identifiers, I don't think it's
APH> robust to manage them within the docreg files.

Why not? With my file format it#s possible to have real persitant IDs.

APH> > Why not? Ok you should avoid it, if there#re translated documents. But
APH> > this is not a problem, because all documents are maintained by Debian.
APH> This just isn't reality.  96% or so of the documents are *not*
APH> maintained by debian.

??? We#re talking about a doc system for Debian. And all Debian doc  
packages are maintained by Debian. So where#s the problem?

APH> > ??? I don#t understand that. If you install only the translated
APH> > document, a system like dhelp shows it as original. Where#s the problem
APH> > and where#re the differences?
APH> Because it's *not* the original.

Ok, than call it prefered language. Please test dhelp 0.4.1 with
doc-linux-de. Have a look at /etc/dhelp.conf.

APH>  Why should the state of a resource
APH> change depending on what is installed.  If a document is a


 1.) You have installed doc-linux-de and your prefered language is
     => dhelp shows the german description/title
 2.) You install doc-linux-html.
     => dhelp will show the English description/titles instead of
        the German ones.

I think that this is a really good solution.

APH> translation, it's a translation no matter what packages you have

That#s right.

APH> installed locally.  Suddenly, a package is installed, and an
APH> "original" document turns into a translation?!  Did I read that right?

Not really. A dhelp user will never know which one is the original. He  
will see one document available in several languages. This looks like:

  Drucker HOWTO (de, en)
    Beschreibt das Drucken unter Linux ...

APH> > That right and this is one advantages. And again, you proposal enforces
APH> > unique ids, too! If two docreg files add the same URL you have got
APH> > problems with the relations.
APH> What problem?

Which one would be the original for the relation link? How should I  
display such entries? For example:

  Identifier: www.debian.org
  Title: Debian Homepage

  Identifier: www.debian.org
  Title: This is another title for the same URL

Problem 1:

  Identifier: www.de.debian.org
  Title: Deutsche Homepage
  Language: de
  Relation: www.debian.org

Problem 2:

  Identifier: www.debian.org
  Title: Deutsche Homepage
  Language: de
  Relation: www.debian.org

APH> > We#re talking about a solution for Debian. We#re *not* talking about
APH> > solutions for libraries, books, or the WWW.
APH> I agree with this philosophy.  But I'm frustrated that you don't
APH> understand the difference between metadata entities and resources.

No problem, I understand the difference. But I#m not sure if this  
difference is really important. You could use my proposal with resource or  
metadata IDs. There#s no difference.

But even if we use resource IDs, I don#t like your proposal.

APH> (Identifier, Relation.*) I need to be able to refer to resources
APH> notwithstanding the local metadata entities that are installed.

Sorry, but I don#t understand that. Why is it necessary for the relation  
tag that we use the filename as identifier? I don#t see that.

APH> I'm not against additional info, I just am against poor, non-robust
APH> design.

That#s right and using filenames as identifier or identifiers as filename  
is a poor and non-robust design. The identifier shouldn#t change if you  
move the docreg or the document. This is an error in the DC proposal.

APH> I don't think I'm adding complexity; you are, by conflating
APH> two types of entities and adding additional, unnecessary fields.

I#m conflating nothing. You could use my file format in the same way as  
your proposed file format. And a persitant ID is not unnecessary. For  
books you have such an ID, too.

APH> What is won if I make maintainers type:
APH>   Identifier: gobbletygook
APH>   URL: http://www.debian.org/
APH> instead of my scheme:
APH>   Identifier: http://www.debian.org/

The ID ist unique and persitant. And if we like we could add one URL  
several times. But this is an optional idea.

APH> So, I say that *you* are the one adding needless complexity.

I don#t think so.

APH> Looking at your examples, and trying to think like a new user, I just
APH> wonder why we need to have some sort of proprietary label that we
APH> (Debian) applies to every possible URL under the sun.  If you are

Why should we copy the problems of the WWW?

APH> (a) people will change their identifiers, which will leave stranded

That#s why we have a policy :)!

APH> metadata in your database (suprise!), and will break any packages that
APH> use those identifiers

If I add the same URL with your proposal, it breaks the database, too. So  
what? We have to talk about a good ID scheme for my solution, no question.

APH> (b) people will duplicate identifiers, because they will have to, i.e.,
APH>   [From docreg file foobar.docreg]
APH>     Identifier: gobbletygook
APH>     URL: http://www.debian.org/
APH>   [From docreg file feebar.docreg]
APH>     Identifier: morenonsense
APH>     URL: http://www.debian.org/
APH>   etc....

Ok, I know that you don#t like that and this should be used in special  
cases only, but I don#t see the problems.

APH> (c) people will constantly ask why the hell they have to have this
APH> arbitrary identifier, how should they name it, what does it mean?

Why does a book has got an ISBN number? Because it#s necessary. We have to  
tell the people, which names they should use.

APH> (d) people will cut-n-paste docreg files and come up with duplicate
APH> identifiers, stomping on one another

That#s why you have to write a docregtest script for our developers.

APH> My point is that your proposal raises as many problems as it solves.

I don#t think so.

APH> I guess I'll have to do an experimental dhelp type pkg of my own
APH> proving that your scheme isn't required.... ;)

This would be a waste of time. Please feel free to change dhelp#s parser.

cu, Marco

Uni: Budde@tu-harburg.de           Fido: 2:240/5202.15
Mailbox: mbudde@hqsys.antar.com    http://www.tu-harburg.de/~semb2204/

Reply to: