[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

The EDOS project: request for comments on a Debian Query Language



Dear Debian users and workers,

The EDOS project (http://www.edos-project.org/) is a research project funded by
the European Union and focusing on fundamental aspects of free software
processes such as distribution, quality assurance and dependency management.

As the WP2 (Dependencies Management) team, we have developed a set of tools for
organizing, displaying, measuring and checking metadata information.

We need your feedback and ideas for defining a query language that would be
useful to maintainers, power-users and FOSS researchers alike.  Two of our
tools, "ara" and "history", have query languages that can be used as starting
points.

I would also like to point out that the EDOS team is organizing two tracks at
the RMLL 2006 Conference, Nancy, France:
  - EDOS track on Large Software Systems Management, July 6 2006:
        http://www.rmll.info/conf_124
  - First EDOS workshop, July 7 2006:
        http://www.rmll.info/theme_60

I will now give some background information on what we have done and what we
want to do.

We have a complete, correct and efficient tool that takes a "Packages" metadata
file and finds packages that are not installable by solving the associated
satisfiability problem.  This checker is available as a command-line tool
temporarily named "debcheck", which can be checked out from the EDOS SVN
server.  (There is also a RPM version.)  This tool doesn't simply check
first-order dependencies or conflicts, but encodes the total installability
constraints as a boolean formula which is then solved very quickly (one minute
to check the whole unstable).  It is in the process of being packaged, see

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=365087

Tools for handling the Debian metadata with its evolution in time are now being
developed.  More precisely, we are downloading every day files such as
Packages.{gz,bz2} for all distributions and components and filling a database
with these data.  We also find, for every day and every distribution (testing,
unstable, stable) uninstallable packages using the algorithms of debcheck.  

This database :
  - Can be browsed on-line using a web interface, "anla".
See http://brion.inria.fr/anla/ but please note that the database is not very
up-to-date.  (Due to performance issues which will be addressed by the
upcoming rewrite.)
  - Can be queried using the "history" tool in a particular query language. 
    For a brief description of that language (dubbed DQL), please see

      http://gallium.inria.fr/~durak/dql.html

    These are from pages 104 to 109 of the EDOS deliverable 2.2 available at

      http://www.edos-project.org/xwiki/bin/download/Main/Deliverables/edos-wp2d2.pdf

We want to write clean, re-engineered and integrated versions of these tools,
using a query language tailored to fit the needs of package maintainers, but
also researchers.  As researchers, we may want to do interesting things like
displaying the evolution of the number of non-installable packages, or finding
packages that some criteria, such as packages tagged as libraries that have no
packages that depend on them except packages from the same source.  We think
this would also be very useful to developers and power users.  For instance, we
would like to be able to type something like

  bash$ dql 'packages(stable) \ (depends(libc6) | tagged("lib"))'

to get a list of packages in the current stable archive that do not depend
directly or indirectly on libc6 and that are tagged "lib".  (This is just an
example to give a taste of things we want to do.)

Please reply to this thread using your ideas, comments and suggestions.
Ideally, we would have a command-line version that would respond instantly.
Here are a few ideas to start with.

Language and environment
------------------------
1) Date manipulation functions.
2) Ability to load multiple historical data sources.
3) Multiple views and environments.

Data I/O
--------
4) Ability to load and save environments.
5) Ability to load installation status.
6) Ability to load non-historical, unparsed data sources.
7) Ability to load installation status data

Statistics
----------
9) Ability to easily do statistical measurements:
     a) Count number of things.
     b) Plot the evolution of some quantity with respect to time.
     c) Plot a graph of the number of packages added per day.
     d) Compute the average number of versions per package.
     e) Display the 10 units with most versions.
10) Find packages most depended upon

Complex searches
----------------
11) Ability to do complex searches:
     a) Search packages by description, regular expressions
     b) Search packages by tags
     c) Search packages by dependency
12) Search packages by contained files

Installability
--------------
13) Ability to "simulate" installations and upgrades
14) Find non-conflicting packages that have a common file :
   - with the same MD5
   - with different MD5s
15) Find uninstallable packages
16) Find uninstallable packages that fail because of conflicts
17) Assuming a set of packages is installable or not, find uninstallable
packages
18) Compute an installation minimizing a given metric and satisfying some
criteria

--
Berke Durak, Ph.D. Comp. Sci
The EDOS Project, http://www.edos-project.org/



Reply to: