Re: Misclassification of packages; "libs" and "doc" sections

To: Eray Ozkural <erayo@cs.bilkent.edu.tr>
Cc: Thomas Hood <thood@excite.com>, Debian Developers List <debian-devel@lists.debian.org>
Subject: Re: Misclassification of packages; "libs" and "doc" sections
From: Daniel Burrows <Daniel_Burrows@brown.edu>
Date: Thu, 12 Oct 2000 17:05:03 -0400
Message-id: <20001012170503.A5619@torrent>
Mail-followup-to: Daniel Burrows <Daniel_Burrows@brown.edu>, Eray Ozkural <erayo@cs.bilkent.edu.tr>, Thomas Hood <thood@excite.com>, Debian Developers List <debian-devel@lists.debian.org>
In-reply-to: <39E6238F.ECBD0890@cs.bilkent.edu.tr>; from erayo@cs.bilkent.edu.tr on Thu, Oct 12, 2000 at 11:48:15PM +0300
References: <20001011084738.A507@torrent> <39E5002C.803D2EE8@cs.bilkent.edu.tr> <20001011202424.B20753@torrent> <39E513D2.CE64FB56@cs.bilkent.edu.tr> <20001011213738.A21209@torrent> <39E51E08.F19C33C4@cs.bilkent.edu.tr> <20001011225544.A21447@torrent> <39E536C5.8F96CF8B@cs.bilkent.edu.tr> <20001012082313.A711@torrent> <39E6238F.ECBD0890@cs.bilkent.edu.tr>

On Thu, Oct 12, 2000 at 11:48:15PM +0300, Eray Ozkural <erayo@cs.bilkent.edu.tr> was heard to say:

  [snip]

> >   So, just to be sure, you think that we should use configuration files like..
> > 
> > Section: Lisp_Interpreter
> > Contents: emacs20, rep, .....
> 
> Well. There's an arrow from Lisp_Interpreter to Interpreter only
> which reads "Lisp_Interpreter is-a Interpreter". Only outgoing arcs
> are maintained for a digraph (directed graph).
> 
> More like:
> 
> Category: Lisp_Interpreter
> IsA: Interpreter

  Ok, that looks nicer :)  I'm still missing one thing, maybe; it seems
like maybe you're looking at having the nodes in the graph pointing
at what they "inherit" from; I was looking at doing it the other way.  I'm not
sure this is actually relevant, since most of the file-formats I've seen
tossed around could be parsed either way depending on how you want to
use them, though..

  [snip]

> >   There's no reason you can't arbitrarily make a "root" in a graph (just make
> > a node with no edges pointing to it from which all other nodes are reachable,
> > call it "root" it :P), and it might be useful for speeding up some operations,
> > so we have a list of the nodes which "start" hierarchies (or is-a relationships,
> > if you prefer that) -- that is, nodes which are not immediately reachable from
> > another node.  (so when converting to a tree for display, you don't have to
> > calculate this.  You could certainly store it in some other way than a
> > node, but storing it that way seems like it would make some things simpler
> > to do)  Depending on how you want to input the graph, this may or may not be
> > easy to do.
> >
> 
> Okay, you can make such a single "source" vertex to the entire graph.
> But how do you guarantee that all vertices are reachable from it?
> You just have to take into account all vertices that have no outgoing
> edges. Those happen to be the "toplevel" categories, or the most
> general classes in this hierarchy.

  Well, that was rather the point -- you can either store them as a "source",
store them some other way, or recalculate them every time you want them..if
you actually end up needing this information, it seems cleanest to me to view
it as a special "source" node.  (since that lends itself to nice recursive
algorithms without a special case for starting out)  This is a kind of picky
point, though..

  [snip]

> >   I think it might even make sense to permanently store it be in an auxillary
> > file (like Packages)
> > 
> 
> Definitely. Though I'd tend to think that a single file is not something
> maintainable. I guess only auto-generated stuff should be made
> single-files.

  Yeah.  That was my main objection earlier to this idea (and why I mentioned
putting stuff in package control files) -- probably some other mechanism
is needed.

  I'm hoping (cross your fingers!) I can get the next aptitude generation
partly put together soon -- enough so that we could maybe build a "classifier"
into it which partly automates this process (you get a list of unclassified
packages and tell it which package goes in what categories, then it
generates a big file with that info)  I'll make an announcement once 0.9.0 (the
first prerelease for the next version, which will be a port of the current
code to the new UI library -- there's a lot of work left to take full advantage
of the new stuff) is working..

> >   I think a working prototype (prototype what?  UI?  hierarchy?) might be
> > most useful.
> > 
> 
> not UI, but the formal definitions and a suitable category hierarchy.

  Ok.  UI will likely be trivial, actually; apt has a tagfile parser
and aptitude is set up to allow flexible grouping..

> >   Oh, and I think we should perhaps add a Description tag to sections.
> >   For the existing system I'll probably invent my own descriptions for sections
> > eventually, but a proper one would be nice in the future.
> > 
> 
> Of course why not?

  No particular reason, it just occured to me :)

> Another approach is to textually cluster the
> descriptions of packages automatically. That gives us what we
> want (a multiple-inheritance class hiearachy) above packages! I'm
> not sure of the effectiveness of the thing though as Natural Language
> Processing may get too fuzzy. [What if descriptions haven't been
> written well?....] If there're such tools, we could run it to see
> what kind of a hierarchy it finds and then edit it.

  You mean autogenerate categories?  hmmmm...is NLP research really far enough
along to do a reasonable job at this?

  Daniel

-- 
/----------------- Daniel Burrows <Daniel_Burrows@brown.edu> -----------------\
|           "You see, I've already stolen the spork of wisdom                 |
|            and the spork of courage..  together with the spork              |
|            of power, they form the mighty...TRI-SPORK!" -- Fluble           |
\----------------- The Turtle Moves! -- http://www.lspace.org ----------------/

Reply to:

Follow-Ups:
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>

References:
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Daniel Burrows <Daniel_Burrows@brown.edu>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Daniel Burrows <Daniel_Burrows@brown.edu>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Daniel Burrows <Daniel_Burrows@brown.edu>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Daniel Burrows <Daniel_Burrows@brown.edu>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Daniel Burrows <Daniel_Burrows@brown.edu>
- Re: Misclassification of packages; "libs" and "doc" sections
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>

Prev by Date: Re: traceroute root exploit
Next by Date: Re: normalizers
Previous by thread: Re: Misclassification of packages; "libs" and "doc" sections
Next by thread: Re: Misclassification of packages; "libs" and "doc" sections
Index(es):
- Date
- Thread