[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: Keywords instead of Section

On Thu, 2001-11-15 at 18:07, Erich Schubert wrote:
> Thomas Hood wrote:
> > One problem is that keywords can be ambiguous.  Keyword
> > 'X' can mean either that the package is part of an X server,
> > or that it is a program that uses X as a user interface.
> That is correct, but with multiple keywords this works just fine, select
> keywords "X11" and "client", or "x11" and "server", and there you go.

This is a mistake.  Conjoining keywords "x11" and "client" will
select all packages that have something to do with x11 and
something to do with clients, but will not select all and only
those packages that are x11 clients.  Think about it.  :)

"x11" & "client" might match, e.g., client tracking programs that
have X11 interfaces.  The benefit of defined typing is that
it clearly fixes the meaning of the keyword.  "ui:x11" will
match only those packages that employ x11 as their user interface.
The exercise of typing also suggests other descriptors of the same
type that may be used.  Once we adopt the "ui:" type, it is
natural to adopt "ui:x11", "ui:console", "ui:none" descriptors.

> I don't like bundling keywords like "x11-app"

Nevertheless, I am suggesting that pairing type and keyword
together as a descriptor is fundamentally more useful
than using bare keywords.

> > A good solution is explicitly to _type_ the keywords.  Dividing
> > keywords up into groups of the same type is very a useful way
> > of keeping the system orderly.
> I consider this pure cosmetic and thus to be done in the Package
> Browser. There sure is need to group keywords in a useful way, but this
> can be done "runtime".

Typing keywords is no more merely cosmetic than typing
variable names in a computer language.  It has similar benefits.

Perhaps you mean that the type name need not be part of
the descriptor.  True, but including the type name has
benefits.  You do it yourself in some of your examples.

> > the system but integrated into it.  (However, we must absolutely
> > not have a type called 'other' !!!)
> I believe that other should get all those packages which cannot be put
> in any of the other group most programs can be fit into "daemons",
> "clients", "web-apps", "utilities", "development" and maybe "scripts",
> but there surely will be some apps which do not fit into any of them
> really.

A scheme without types amounts to reimplementing the "Description:"
field without grammatical connectives, using an arbitarily chosen
set of adjectives that someone considers interesting.  This
doesn't give us anything that a text search on the "Description:"
field doesn't already give us;  but less, because it limits
the adjectives that one is able to search for.  Every place
I can think of where such a classification system has been
implemented it has died on the vine (people don't bother to 
specify the keywords) because it was easier and more useful to
do a text search on the title or body or description field.

What a classification system has the opportunity to do better
than a text search is to define the meanings of the keywords
very strictly.  This is what typing does.  Making the type
name part of the descriptor is optional, but I think it is a
good idea because it makes the sense of the keyword obvious.
It also allows the same keyword to be used in different senses:
e.g., "ui:x11" vs. "server:x11"; "license:gpl" vs. "topic:gpl";

> Maybe "special" is a better name instead of "other". Of course
> this "group" should be avoided.

As a description, 'special' or 'other' is too vague to be
useful.  One might as well omit it entirely.

If one is resorting to classifying a package as "special", it
means that the classification scheme needs to be enhanced.

> In fact i believe
> that freshmeat does not distinguish between different keyword "groups"
> except for user interface. That's the way i want to go here, too.

Freshmeat/SourceForge types are: Development-Status, Environment,
Intended-Audience, License, Programming-Language, Topic.

I agree that this is not a good set of types.  The "Topic"
type is too broad, for example.

> > allowing any number of keywords to be given on each line.
> Which you cannot include into the existing Package info easily
> (except you make another multiline field)

What's the problem?  Lots of other fields are lists too.
(Depends:, etc.)

> > An alternative is to do it this way:
> >     Class: Development-Status:stable, Environment:X,
> > Environment:console, Intended-Audience:user, License:GPL,
> > Programming-Language:C, Programming-Language:Python, Topic:cooking,
> > Topic:Sex
> Which is basically my proposal.
> "Keywords: client, stable, X11, console, audience-user, gpl, devel-c,
> devel-python, cooking, sex"
> Without treating any keyword special, unless the user prefers to do so.

Our proposals are different.  To search for all "License:GPL"
packages is much more specific than to search for all packages
that have something to do with the GPL.  The latter group may
include documentation packages having to do with GPL, or 
programs used to ensure GPL compliance, or whatnot.  You
suggest that this can be remedied by combining keywords,
but no conjunction of keywords will allow you to search for
all and only those packages whose license _is_ the GPL.
Unless you have a "license-gpl" keyword, of course.  In which
case it makes sense to have a "license-bsd" keyword, and ...
lo and behold, you have developed a typed keyword classification
scheme!  But ad hoc, instead of systematically.

(I notice you have already started typing your keywords in your
list above: "devel-c" and "devel-python" are keywords of the
"devel" type.  What I am saying is that this should be done
systematically and universally from the start.)

Thomas Hood

Reply to: