[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re^2: Debian Metadata Proposal -- draft rev.1.4



Marco, to start off with, I'm going to trim out a lot of stuff where
you complain about how I've gone about doing the standard.  Let me
defend that at the outset.

To start with, most of my free time over the last couple of months has
been dedicated to cleaning up my packages and getting hamm testing
done, mostly in relation to hamm installation on laptops.  I'm sorry
if you disagree with how I prioritize my time, but I tried to decide
how I could help the project most.

It's true that I didn't raise the issue of using Dublin Core (DC)
element set as our standard element set.  But I wanted to work out
what it would look like, so I started re-writing using DC and it
worked out really well. 

It doesn't seem clear to you why using a standard format is important.
Know it or not, but metadata is a hot topic in both the document
management and the database worlds.  If we build up a completely
propriertary system, we're making more work for package maintainers.
Right now, we could have a 'html2debmeta' which converts DC marked-up
HTML files into a docreg file, and 'debmeta2html' vice versa.  RDF
schemes and architectures for DC will *surely* be one of the first
things to come out of the RDF working group, so we can hook in with
RDF tools when they're available.  Finally, using a proprietary system
when a perfectly good standard is available seems unwise to me.

Finally, issues of waiting a month or so to get the *right* system is
much better than rushing the *wrong* system.  This is completely
crucial, in my mind.  So if you have specific technical problems with
my proposal, please raise them, and don't just flame me.

Now on to the points.

Marco.Budde@hqsys.antar.com (Marco Budde) writes:
> I shows *your* totaly new idea of the file format. Where#re all the things  
> we have discussed two month ago?

Please indicated exactly which functionality you want is missing.
AFAIK, I've integrated all our ideas and more.  What's more, it uses
an industry standard element set (see above).

> APH> It's not set in stone.  Not at all.  I posted it here for comments.  I
> 
> But there#re no real changes to your old version. I don#t have a problem  
> with the format itself, but with your way. Instead of wasting the time  
> writing such a big document we should work on the format itself.

How can you say that?  It's completely different from the old docreg
format.  Did you look at the included example or not?  For one, the
tag set changed, although there's a one-to-one correspondance.  For
two, now we don't have a distinction between formats and documents,
because I decided this distinction was too artificial and was going to
create too much complexity and hardship.

You mention your comment from a month ago.  I show:

 1.) identifier#s abstract should be optional.

Functionality now present.

 2.) Maybe author should be renamed to editor, because one document
     can have several authors.

Functionality now present.

 3.) Using title and abstract in identifier and formats is not
     consistent.

Problem disappeared with removing formats as an *entity* at all.  All
documents are first class entities.

 4.) There#s no description of the format of Document: and Location:

Now all fields are documented.

 5.) I#m missing the limits for the identifiers and formats.

I proposed some limits.

 6.) Please remove chapter 2.3. This standard should descripe the
     file format and not one program, that will use this standard.

File specification section now just describes the file format.

 7.) Please add examples, it#s very difficult for our maintainers
     to read the description in 822 format.

Done.

So you keep talking about stuff you said a month ago that I've ignored
but I can't find any such stuff?

> And as a normal package developer I don#t want to read 38 kbytes just to  
> register one document.

I'll take this under advisement.  I've tried to organize the document
such that stuff relevant for package maintainers is at the front.
First is the intro, then the description of the elements, then the
file format.  How could I organize it better?

> APH> wanted to clearly state my position, which took a while.  The last
> 
> Again, *your* position. But as a maintainer it#s not your job to show only  
> your position, but a mixture of the best ideas.

I've taken the best from your suggestions, and the best from the wider
world.  I am *not* just an opinion poll taker and I *am* allowed to
exercise judgement.  I just suspect you don't have any respect for
following standard or learning from work done in archive management
communities outside this small group.

> Marco> I#m not interested in such an API for dhelp, because I need a special
> Marco> structure. And we had discussed this already: dhelp will read the
> Marco> docreg files itself.
> APH> I don't know whether I agree with this or not.

> Ok, let me explain that: I#m sorting all entries in my database in a  
> special way. So I don#t need to read the whole database in memory. And I  
> don#t need all informations stored in the .docreg files, so I#ll not add  
> them to my database (in fact there will be several databases in dhelp  
> 0.4.x).
> 
> doc-base should provide the following:
> 
>  *) the auto converter

Not initially. ;)

>  *) install-docs script: calls the auto converter, dhelp, dwww, ...

Validates the file before calling anything.  Then will have a hook
mechanism to invoke whatever systems are installed, i.e., dhelp.

>  *) Markus directory structure as .dogrec.dir

AFAIK, the ddh is a file containing the DDH entries, not a directory
structure, but I might be confused.  Marcus?
 
> APH> > A single person project :(?
> APH> Aren't you contributing here and now?
> 
> Yes, but without any results. I don#t see any of my ideas in your latest
> proposal.

They are there, look harder.  Actually you're objections is why I
rejected the proprietary standard.

> APH> these ultimately be specified as URNs and served off either the local
> APH> machine or off a web server (i.e., out of the DDP web pages).
> 
> ? I can#t see any problems with my idea, I#m using this method in dhelp  
> 0.3.x and there#re no problems.

Have some VISION for the future, man!

Do you *know* what a URN is?  Do you know why URLs are of limited
values, and the ways that URNs can help?

> APH> > APH>      Currently, the domain of allowable values is
> APH> > APH>
> APH> > APH>         * howto
> APH> > I don#t understand that. We don#t need that, because for this purpose
> APH> > we have got the section tag.
> APH> Not as I read the DDH.  Anyhow, it's an optional field.
> 
> Sorry, but this is not the question. We don#t need this. So why should it  
> be part of the proposal?

I could easily make it a tag which is ignored.  I would like more
opinions than just yours on this issue, however.

> APH> > APH>              * Subject
> APH> > Why have you changed the name of this tag?
> APH> To comply with Dublin Core.
> 
> And why should we do that? We#re talking about a small and *local*  
> configuration file for Debian. Your proposal shows that Dublin Core is not  
> the right solution for our file format.

See above. I don't think it shows any such thing.  I have integrated
the best int the field with where we were already going.  

> For example this description of the content (howto, faq) is not necessary.  
> For this we will have our directory structure.

The *type* of file (FAQ, dictionary file, home page, software package)
and where it is located along the DDH tree are not coupled.  For
instance, we don't have debian/admin/faq, debian/admin/howto.

> And for example
> "Title: (LANG=de)" is maybe a good idea for the WWW and the big search  
> robots, but for use it#s useless.

I agree that LANG in Title and Description fields should *perhaps*
simply reflect the underlying Language field of the document itself.
I.e., a German document would carry a German Title and Description and
that would be that.  I think that might be a reasonable restriction to
make at the outset.  I'd like some opions from other International
users before restricting functionality so radically, however.

> APH> I'm willing to talk about the wisdom of
> APH> using an industry standard or our own proprietary tag set, but I
> 
> Sorry, but Dublin Core is not a industry standard, it#s one idea for a  
> metadata format (see HTML 4 standard

Not a metadata standard, AFAIK.

> or selfhtml

Never heard of it.  References?

>). You#ll not find a lot
> of programms using Dublin Core at the moment.

You *will* find DC people on the XML/RDF working groups, and you will
find that it is probably the most flexible, simple, and extensible
system out there now.

I don't really see why you're arguing against it here but now below?

> APH> haven't seen a single good argument to use a proprietary tag set when
> APH> perfectly good standards are available.
> 
> Because Dublin has got an other target?

Functional arguments please.  What functionality is missing?  What do
you not like, aside from the rather *trivial* issues you've brought up
(Type element and LANG scheme).  Below you admit the functionality is
there and only these minor issues raise your objections.

> APH> > APH>              * Rights
> APH> > Do we need this tag?
> APH> Again, it's optional.
> 
> That#s not the question. If it#s optional in Dublin Core we could drop it  
> from our proposal.

Why?  I think knowing the freeness of a document before reading it is
useful, it's in harmony with the Debian Tao, and I don't see you
offering any arguments against this *optional* tag except that you
don't see any reason for it.  But again, the issue of whether the tag
is optional or demoted to ignored is kinda a non-issue.

> Right, I#ll really don#t like your solution. If you propose something like  
> Dublin Core, you should think a little big greater. Why should we limit  
> the use of our file format and our programms only to the Debian packages?

> It#s also interesting for maintaining the local documents of the user.

I don't understand.  Ability to use DC for both ourselves in a local
use managing documentation, ability to use it for DDP and greater
debian-wide facilities, and great use in non-debian circles all seem
like positives!

> APH> If I recall, you would prefer to have docreg files in the
> APH> documentation area, i.e., /usr/doc/<pkg>.  I am amenable to this,
> APH> actually, but if you're going to be reading docreg files directly, I
> APH> think this is going to be evil.
> 
> I don#t understand that. Do you think that speed is a problem? It#s not.  
> I#m using this in dhelp without any problems.

No, you read the file once and then build up your own database.  Why
does it even matter where the file is since you ignore it once it's in
the dhelp database?  Also, what about the fact that dhelp cannot deal
with the fact that if a file changes (i.e., a user using vi on the
file) and the the removal procedure is run, the entry is not in fact
removed.  This seems like a design flaw in dhelp.

I felt that the "shadowing" of data, and the errors which creep in
because of that, are the fundamental design flaw of the old doc-base.
That's why I'm talking about a central document store.  If we could
solve the fast access problem of metadata issue, and solve it as part
of doc-base, then that will lead to better and more plentiful display
systems.  I hope you don't revert to "turf battles" about this issue,
trying to maintain the roles of doc-base/dhelp etc to the detriment of
the technical excellence of the system.

Basically want I'd love to see is what you've done with the database
from dhelp, but made general and standard so every display system can
use it.  What do you think?

> APH> One thing I want you to keep in mind that hamm's /usr/doc/<pkg> will
> APH> be slink's /usr/share/doc/<pkg>, AFAIK.
> 
> This is not a problem with my solution. But it could be a problem with  
> your solution. A lot of users will install slink and hamm packages on one  
> system.
>
> Then we#ve to add /usr/share/doc *and* /usr/doc to our webstandard in the  
> policy. But with your solution (storing the .docreg files in /var/...)

(Well, /usr/share/doc-base/docreg/* actually)

> you  
> will have a problem: how will you distinguish if your path is relative to  
> /usr/doc or /usr/doc/share?

I agree that dealing with both FHS and FSSTD at the same time is
crucial, and could be a problem with what I suggested.  So this is a
very good point, which I think you for pointing out.

I also would like to see the redirection of identifiers to central
locations as well.  Do you have ideas on how to do this?

> APH> Personally, there's an easy way to resolve this.  Someone needs to
> APH> study URN spec more, and we must do what it will take to transition to
> APH> URN or use URN right from the start.  Any system which will make it
> APH> *harder* to move to URN is unwise, IMHO.
> 
> What does URN mean? Could you please explain this?

Read materials at http://www.w3.org/Addressing/ .

The whole issue with URN is to find a way to address documents in a
way that is not coupled to an individual host and/or file path.

> Marco> I#m missing the tags for adding new sections and their description.

> APH> Yes, that's not part of the docreg spec per se.  The DDH is a "SCHEME"

> But it should be part of docreg. Maybe it#s a good idea to seperate the  
> documents and the directories: for example .docreg and .dogreg.dir?

No, it's not part of docreg, but it is part of the Debian Metadata
spec.

AFAIK, right now, there shall be one big file, in RFC822 format,
defining the heirarchy.

> APH> Marco is working on this.
> 
> Me :)? You#re speaking about Markus, right? 

Yes.

> He#s working on the "directory  
> informations" but we need a file format to add this informations to the  
> databases of dhelp, doc-base, etc. And this is very important.

Definately.  I didn't want to wait before showing you all my proposal
thus far however.  I need to talk with Marcus a little more on the
ultimate delivered form of the DDH specification.

> APH>  I'll probably help him with the definition
> APH> of the file format.  It's pretty straight forward, it's not in a
> APH> standard format because there *are* no such standards AFAIK, and it
> APH> hasn't changed much.
> 
> Again, we need an open discussion about such things. And not only the  
> result of one person at the end.

Ok, well I'll see what I can do and get it out as soon as possible.

> That#s right, this is my main problem with your proposal. And I don#t like  
> the new things like (LANG=...) and the content description (howto, faq, ).  
> The format itself is ok (in fact I don#t see any big differences to  
> .dhelp).

Well, that's good.  I hope you can see how the new system allows a lot
of the functionality you proposed.

> APH> Why do you not like the docreg format?
> 
> *) placement of the files (should be /usr/doc/<package>)

Amenable to this if others agree.

> *) URLs should be relative to the .docreg file
>    real internet URLs are allowed (to add for example the bug tracking
>    system)

Reserved about this because its seems like bad system design; although
as you point out it does solve the FHS vs FSSTD problem.  I say its
seems like bad system design because you are coupling the file system
location with the resource identifier location.  Converting in and out
of docreg format would be difficult.  Any storage system would need to
track where it discovered the docreg file, for making absolute
references out of relative ones.

None of these are deal-breakers.

Finally, here's another problem with this scheme, a more serious one.
Suppose package foobar, which is only FSSTD compliant, installs a
docreg file in /usr/doc/foobar/foobar.docreg.  Suppose that
'foobar.docreg' contains an entry for 'foobar.html' (relative, in your
scheme).  As I understand it, in your central document store, you
would have an object (row) with an identifier of
'/usr/doc/foobar/foobar.html'.  Then suppose the next version of the
pkg foobar is FHS compliant, but the maintainer forgets to run the
removal process at the old location.  Now, as I understand it, when
the new pkg installs, we'll have another object (not replacing the
original) with an identifier '/usr/share/doc/foobar/foobar.html'.
How do we deal with this?  Reap objects without files?

The *real* issue is not how the *identifier* element is encoded, but
*what* constitutes the object ID for a document.  In my scheme, it's
<package>/<file under pkg doc dir>, but this will get hairy in a
heterogenous FHS/FSSTD world.  In your scheme, it's completely
unspecified, which is going to lead to nightmares.

What's the solution?

-- 
.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>


--  
To UNSUBSCRIBE, email to debian-doc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: