Re: Misclassification of packages; "libs" and "doc" sections
On Thu, Oct 12, 2000 at 11:48:15PM +0300, Eray Ozkural <erayo@cs.bilkent.edu.tr> was heard to say:
[snip]
> > So, just to be sure, you think that we should use configuration files like..
> >
> > Section: Lisp_Interpreter
> > Contents: emacs20, rep, .....
>
> Well. There's an arrow from Lisp_Interpreter to Interpreter only
> which reads "Lisp_Interpreter is-a Interpreter". Only outgoing arcs
> are maintained for a digraph (directed graph).
>
> More like:
>
> Category: Lisp_Interpreter
> IsA: Interpreter
Ok, that looks nicer :) I'm still missing one thing, maybe; it seems
like maybe you're looking at having the nodes in the graph pointing
at what they "inherit" from; I was looking at doing it the other way. I'm not
sure this is actually relevant, since most of the file-formats I've seen
tossed around could be parsed either way depending on how you want to
use them, though..
[snip]
> > There's no reason you can't arbitrarily make a "root" in a graph (just make
> > a node with no edges pointing to it from which all other nodes are reachable,
> > call it "root" it :P), and it might be useful for speeding up some operations,
> > so we have a list of the nodes which "start" hierarchies (or is-a relationships,
> > if you prefer that) -- that is, nodes which are not immediately reachable from
> > another node. (so when converting to a tree for display, you don't have to
> > calculate this. You could certainly store it in some other way than a
> > node, but storing it that way seems like it would make some things simpler
> > to do) Depending on how you want to input the graph, this may or may not be
> > easy to do.
> >
>
> Okay, you can make such a single "source" vertex to the entire graph.
> But how do you guarantee that all vertices are reachable from it?
> You just have to take into account all vertices that have no outgoing
> edges. Those happen to be the "toplevel" categories, or the most
> general classes in this hierarchy.
Well, that was rather the point -- you can either store them as a "source",
store them some other way, or recalculate them every time you want them..if
you actually end up needing this information, it seems cleanest to me to view
it as a special "source" node. (since that lends itself to nice recursive
algorithms without a special case for starting out) This is a kind of picky
point, though..
[snip]
> > I think it might even make sense to permanently store it be in an auxillary
> > file (like Packages)
> >
>
> Definitely. Though I'd tend to think that a single file is not something
> maintainable. I guess only auto-generated stuff should be made
> single-files.
Yeah. That was my main objection earlier to this idea (and why I mentioned
putting stuff in package control files) -- probably some other mechanism
is needed.
I'm hoping (cross your fingers!) I can get the next aptitude generation
partly put together soon -- enough so that we could maybe build a "classifier"
into it which partly automates this process (you get a list of unclassified
packages and tell it which package goes in what categories, then it
generates a big file with that info) I'll make an announcement once 0.9.0 (the
first prerelease for the next version, which will be a port of the current
code to the new UI library -- there's a lot of work left to take full advantage
of the new stuff) is working..
> > I think a working prototype (prototype what? UI? hierarchy?) might be
> > most useful.
> >
>
> not UI, but the formal definitions and a suitable category hierarchy.
Ok. UI will likely be trivial, actually; apt has a tagfile parser
and aptitude is set up to allow flexible grouping..
> > Oh, and I think we should perhaps add a Description tag to sections.
> > For the existing system I'll probably invent my own descriptions for sections
> > eventually, but a proper one would be nice in the future.
> >
>
> Of course why not?
No particular reason, it just occured to me :)
> Another approach is to textually cluster the
> descriptions of packages automatically. That gives us what we
> want (a multiple-inheritance class hiearachy) above packages! I'm
> not sure of the effectiveness of the thing though as Natural Language
> Processing may get too fuzzy. [What if descriptions haven't been
> written well?....] If there're such tools, we could run it to see
> what kind of a hierarchy it finds and then edit it.
You mean autogenerate categories? hmmmm...is NLP research really far enough
along to do a reasonable job at this?
Daniel
--
/----------------- Daniel Burrows <Daniel_Burrows@brown.edu> -----------------\
| "You see, I've already stolen the spork of wisdom |
| and the spork of courage.. together with the spork |
| of power, they form the mighty...TRI-SPORK!" -- Fluble |
\----------------- The Turtle Moves! -- http://www.lspace.org ----------------/
Reply to: