Re: debian meta data
> I'd really be interesting in getting a hold of a C implementation of
> an RFC-822 parser. I could even help you package it up and distribute
> it or something.
I'm not confident, that the thingy already is fully RFC-822 compliant,
but I never had problems processing emails and http requests.
I "just hacked" it away, when my prototyp code became too slow.
> However: I hate surrogate keys, that is, unique IDs that are made up.
> I don't want to support them, I don't want them at all. They are not
I'm not sure that this will work out.
See, when you malloc something, you don't care about the address.
Same for those id's. Look at them as if they where a virtual address
in a document address space.
> inherently meaningful to people, therefore they are not meaningful to
> the user, and make debugging and maintenance very difficult.
Sure they are not meaningful to the people. But first experience
tought me, that the debugging is not that hard. What you want to do
is to hide them from the people. This happens by giving names, i.e.,
aliases. One id can have any number of names/aliases. Those look
like file names.
You (or better me) also want to put maintenance under software control.
BTW: I don't think that you have a chance to get rid of the unique id
after all. Or don't your files have an inode number? Well, those
inode numbers are reused as file names are, which is a bad idea for
> Furthermore: I forsee the document ID (in my scheme, something like
> debian-policy/policy.html/index.html) enabling multiple
> interpretations! For instance, "in the documentation area of the
> package debian-policy, the file policy.html/index.html". This is
Right. The naming *must* be local, those ids are supposed to be
unique and global. Global naming never works. Some issue will always
come up, which was not foreseen and break the whole thing.
You will have to have both, a unique id, wich is not for the
people, but the machine and a name, which is interpreted by both, the
human and the machine.
When I was working on my thesis, I experiemented some time with those
issues and found two things related: some work I don't have close,
which was from human language research (structuralism). My english
skills and the fact that this is too long ago, both prevent any
details here right now. And the second thing was the naming solution
in VSTa (an experimental micro kernel os). That one has no global
names in the file system. And it's really easy to config things
hardly done with unix etc.
To put it short, the lesson I learned was: "drop all global naming, it
is meaningless". (Sure thats overgeneralization, don't flame, i want
to give a picture here.)
Within the WrapBit design a path as you gave
(debian-policy/policy.html/index.html) is interpreted step by
step. That is 'debian-policy' knows somehow where to find
'policy.html' which in turn might use a completly different mechanism
to figure what 'index.html' really is.
Or for a totally unrelated real world example. there is a
firstname.lastname@example.org cc-ed. This will deliver the mail into the first
piece of such a path. That will just filter out some mailing lists,
find it not related to anything and forward to another thingy which
gets all rest mail. That in turn will simply store it assigning a
pretty meaningless number.
I can reference the mail via an md5sum automaticaly assigned when the
msg is stored, or that number. But at another page I can put a link
and assign different name and find it by a totaly different search
mechanism. (Even better, as long as there is one such link, the
document never gets lost.)
> important to do because I think the documentation area is going to
> move from /usr/doc to /usr/share/doc . Futhermore, it's important for
> ultimate URN support. This would enable me to say
> "debian:debian-policy/policy.html/index.html" or however one would
> formulate it as a URN, and if the user doesn't have the package
> installed, the URN server would redirect to a suitable mirror.
Aggreed. I always envisioned such a scheme which allows to fetch a
certain document from wherever if it is not present at a given point.
I think that in the example above the 'debian:debian-policy' or
whatever should find the mirror.
When writing the I've got the impression, that we will have to clarify
what we are addressing. My concern is a general mechanism, how to
handle documents. For the debian doc I believe a specialized instance
is needed. We should separate those issues, otherwise everything will
We should get the ip-phone going. I have a hard time explaining the
idea in writing whenever I try. But whenever I follow it and apply it
to some real problem, it proofes so useful.
> Probably what I should do is force, for relative URLs, the first
> argument, "debian-policy" to be in fact the package name. If the
> actual resource is *not* in the doc area of the package holding the
> metadata, the developer should use an absolute file URL. I'm not sure
> that this isn't too draconian. Let me think about this a bit more.
Didn't get that.
> >> 2.1 Automatic Document Conversion
> > This all sounds as if we should tweak the rabbit a little.
> I'd love that. 'install-docs' *has* to be a quick fast program, and
Hm, not sure that it is really quick yet. Too much shell. Too many
fork/exec where threads could be used. But good enough for
interactive use if you can bear a little.
> small, since it's on or near abouts the base system. So you think we
It's *still* small (170k gziped tar for source + 700k tgz test data &
docs [which is actualy the same]), but needs a lot of other things we
would have to get rid of.
> could modify rabbit to reference some local document store (i.e., a
> little database of metadata). I'll have to take a look at your code.
But is actually nothing but a (yet) local document store, holding data,
meta data and enforcing a little bit of policy (at a level where I
don't know whether this is policy or mechanism.)
To UNSUBSCRIBE, email to email@example.com
with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org