[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: Keywords instead of Section



Erich Schubert wrote:
> 
> > > That is correct, but with multiple keywords this works just fine, select
> > > keywords "X11" and "client", or "x11" and "server", and there you go.
> >
> > "x11" & "client" might match, e.g., client tracking programs that
> > have X11 interfaces.  The benefit of defined typing is that
> 
> incorrect. this is NOT a Full-Text-Search, and the keyword "client" is
> for "client applications in a client-server model", not for things such
> as client tracking (i'd suggest "customer management" or a keyword like
> that for this)
> Remember: the keywords are EDITED, not automatically generated!

  it does not matter, people will screw it up just like generators will.
There has to be a structure, otherwise you have nothing but mud...

> > type that may be used.  Once we adopt the "ui:" type, it is
> > natural to adopt "ui:x11", "ui:console", "ui:none" descriptors.
> 
> that's just cosmetic, too. Of course we can call the keywords any way we

  everything from machine language up is purely cosmetic - it's not for
computers but to make it managable by people. this is basically the same
as types of variables in programming languages - you can do without them
but it's a lot easier to program in languages that have types (note that
even in sort of typeless languages like perl you actually have types -
numbers are treated differently than strings, when you create objects
they have type etc.)

> want. I like the way "ui:x11", too. But i would not restrict the
> implementation to treating this specially.
> 
> > Perhaps you mean that the type name need not be part of
> > the descriptor.  True, but including the type name has
> > benefits.  You do it yourself in some of your examples.
> 
> I'm talking about implementation, not about using them. I see no need at
> all to treat this specially.

  type needs to be treated differently from the actual value. so
implementation has to be aware of this.

> > What a classification system has the opportunity to do better
> > than a text search is to define the meanings of the keywords
> > very strictly.  This is what typing does.  Making the type
> > name part of the descriptor is optional, but I think it is a
> > good idea because it makes the sense of the keyword obvious.
> > It also allows the same keyword to be used in different senses:
> > e.g., "ui:x11" vs. "server:x11"; "license:gpl" vs. "topic:gpl";
> > etc.
> 
> All this does NOT belong into the implenation. I do not want to change a
> single line of code if someone want's to redo all classification and
> implement another sceme.
> All this is a decicion of the "Keyword Commitee", and they are - as i
> wrote before - to definie the keywords and their meaning.
> So if they define "gpl" as keyword for Applications unter GPL Licence,
> that is just what i was thinking of. But this is the Commitee's Choice
> and has to be kept out of the implementation, so it can easily be
> changed (think of a company doing a Debian-Based Distro not caring at
> all about Licence's - they might want to leave that away completely.
> Other's might want to do a completely different categoriation!).

  this does not make sense. the types and keywords are both data and
both can be managed by comitee, without changing program. you just have
two sets of data, fairly static type structure and slightly more dynamic
set of keywords. but both can be changed without changing
implementation.

> > As a description, 'special' or 'other' is too vague to be
> > useful.  One might as well omit it entirely.
> 
> So the packages not fitting into a category are not found by novices?

  no, see the paragraph you quoted just below this one

> > If one is resorting to classifying a package as "special", it
> > means that the classification scheme needs to be enhanced.
> 
> No. It means "there are to few packages fitting in here to add a new
> class".

  that's not a reason not to have a class. why would a class would have
to have certain number of packages? If it's distinct enough there should
be a class for it. e.g. if we suddenly get a new ui type there might be
only handlful (or none!) programs using it, e.g. for berlin. that does
not mean there should not be a category for it.

> > Freshmeat/SourceForge types are: Development-Status, Environment,
> > Intended-Audience, License, Programming-Language, Topic.
> 
> Which are basically all equal in the database and are assigned Integer
> Numbers from the same namespace. This Categorization of Keywords is
> purely cosmetic in the user interface (i think).

  how you implement it does not matter, the crucial part is that you
have types and you can query by type. it makes sense to maintain types
and keywords as separate data entities. both are data and to change any
you don't need to change implementation (well, you need to change
implementation we have now).

  so while you can have it like this:

  ui:x11
  ui:text
  licence:gpl
  licence:bsd

  and consider these 4 different keywords it makes a lot more sense to
have it in two separate data sets:

  types:
    ui
    licence

  keywords:
    x11, type=ui
    text, type=ui
    gpl, type=licence
    bsd, type=linence

  it's basically normalizing data... it has number of advantages (I
guess that's obvious). you can e.g. check the type - it has to exists.
etc.

> > I agree that this is not a good set of types.  The "Topic"
> > type is too broad, for example.
> 
> That is exactly why i do not want to implement such an hierarchy in the
> package system itself, this belongs into the user interface, where
> people can have multiple, differing implementations.

  the hierarchy is just data! it's not hardcoded in package system.

> > What's the problem?  Lots of other fields are lists too.
> > (Depends:, etc.)
> 
> But "Depends:" is well defined, where your way requirez _dozens_ of
> different additional fields, making parsing much more difficult.

  but still easy.

> p.E. package managers not knowing what your "Licence:" Field is will not
> provide this way of selecting Packages to their users.

  why not? that's why you have defined set of types. user can query by
any type, the package system does not have to be specifically aware of
it. Also, user can see the list of all types, again, package system does
not have to be aware of them - it's just data.

> So if we decide to add another Field (like Intended-Audience:) ALL
> Package Managers will have to be modified! Thats EVIL.

  no, that's not true. see above.

> > > Which is basically my proposal.
> > > "Keywords: client, stable, X11, console, audience-user, gpl, devel-c,
> > > devel-python, cooking, sex"
> > >
> > > Without treating any keyword special, unless the user prefers to do so.
> >
> > Our proposals are different.  To search for all "License:GPL"
> > packages is much more specific than to search for all packages
> > that have something to do with the GPL.  The latter group may
> 
> You didn't understand my proposal.
> I'm NOT talking about a full-text-search (which is already provided by
> apt-cache search and most package managers) and which would indeed find
> all Packages having to do anything with gpl.
> 
> The keywords are to be edited and well defined by a "Keyword Commitee",
> and it's their job to make good keywords, not the software's.

  that's what he said as well. he just differentiates between types of
keywords and keywords. which is very useful.

> > include documentation packages having to do with GPL, or
> > programs used to ensure GPL compliance, or whatnot.  You
> > suggest that this can be remedied by combining keywords,
> 
> It can, of course.

  and, of course, it's a perfect way to make a mess out of keywords. see
the reasons why you have data in databases normalized, or why there are
types used in programming languages etc... you need a structure,
otherwise your system will collapse into a mess less useful than
full-text search.

  just because it can be done does not mean that it's an acceptable way
of doing it.

> > but no conjunction of keywords will allow you to search for
> > all and only those packages whose license _is_ the GPL.
> 
> If the keyword is defined to be "gpl-licenced programs only", this is
> exactly what you want, if it's about "anything to do with the gpl"
> (which might not be too useful, use full-text-search for that!)
> blame the keyword commitee.

  yes, and to explicitly say this you need types.

> > Unless you have a "license-gpl" keyword, of course.  In which
> > case it makes sense to have a "license-bsd" keyword, and ...
> 
> Correct. And a licence-nonfree keyword of course, so i can easily
> drop all non-free software from my selection.

  to do this you would have to have the types system itself more complex
- basically having derived types (license, and derived types
licence-free and licence-non-free; so that you can query for all
licences or all free licences etc.). however using your proposal you
couldn't do it all (in any maintainable way).

  the point here is that people creating keywords can look up types and
use that information, in your proposal there is no explicit set of types
(and therefore people are basically free to use any (even non-existing)
type).

  when you have explicit types you can even require certain set of
keywords (e.g. you always have to have a keyword of type ui and
licence). that helps people creating keywords and it also helps users.

	erik



Reply to: