Some thoughts about fields in UDD

To: Debian QA List <debian-qa@lists.debian.org>
Subject: Some thoughts about fields in UDD
From: Andreas Tille <andreas@an3as.eu>
Date: Thu, 8 Jul 2010 09:44:09 +0200
Message-id: <[🔎] 20100708074409.GD18151@an3as.eu>

Hi,

I did some checking investigation in UDD and stumbled about some
data I do not see any sense:

** The column 'origin' **

udd=> SELECT origin, COUNT(*) FROM packages GROUP BY origin;
 origin | count  
--------+--------
        | 817717
 debian |    188
 Debian |     11
(3 rows)

udd=> SELECT DISTINCT package, source, origin, section, component FROM packages WHERE origin  ilike 'debian';
    package     |     source     | origin |     section      |       component       
----------------+----------------+--------+------------------+-----------------------
 libdpkg-dev    | dpkg           | debian | libdevel         | main
 modutils       | modutils       | debian | admin            | main
 emifreq-applet | emifreq-applet | Debian | gnome            | main
 modutils-full  | modutils       | debian | debian-installer | main/debian-installer
 libdpkg-perl   | dpkg           | debian | perl             | main
 dpkg           | dpkg           | debian | admin            | main
 dpkg-dev       | dpkg           | debian | utils            | main
 dselect        | dpkg           | debian | admin            | main
 modutils-basic | modutils       | debian | debian-installer | main/debian-installer
(9 rows)

However

udd=> SELECT origin, COUNT(*) FROM ubuntu_packages GROUP BY origin;
 origin | count  
--------+--------
 Ubuntu | 295722
(1 row)

So in ubuntu_packages the column is consistently filled - but what is
the sense of this column?  Just telling that a package is part of a
certain table?  Shouldn't this be simply possible in a view.  For me
this is a total waste of disk space (even if I would assume the column
would be properly used in packages).


** The column 'bugs' **

udd=> SELECT bugs, COUNT(*) FROM packages GROUP BY bugs;
                 bugs                 | count  
--------------------------------------+--------
                                      | 817703
 debbugs://bugs.debian.org            |    155
 http://sourceforge.net/projects/xfe/ |     30
 mailto:formorer@debian.org           |      2
 mailto:rra@debian.org                |     15
 debbugs://bugs.debian.org/           |     11
(6 rows)

udd=> SELECT DISTINCT package, source, section, component, bugs FROM packages WHERE bugs !=   '';
         package          |          source          | section  | component |                 bugs                 
--------------------------+--------------------------+----------+-----------+--------------------------------------
 emifreq-applet           | emifreq-applet           | gnome    | main      | debbugs://bugs.debian.org/
 dpkg                     | dpkg                     | admin    | main      | debbugs://bugs.debian.org
 libdpkg-dev              | dpkg                     | libdevel | main      | debbugs://bugs.debian.org
 dselect                  | dpkg                     | admin    | main      | debbugs://bugs.debian.org
 xfe-i18n                 | xfe                      | x11      | main      | http://sourceforge.net/projects/xfe/
 libafs-perl              | libafs-perl              | perl     | main      | mailto:rra@debian.org
 xfe                      | xfe                      | x11      | main      | http://sourceforge.net/projects/xfe/
 libdpkg-perl             | dpkg                     | perl     | main      | debbugs://bugs.debian.org
 dpkg-dev                 | dpkg                     | utils    | main      | debbugs://bugs.debian.org
 debian-backports-keyring | debian-backports-keyring | misc     | main      | mailto:formorer@debian.org
 xfe-themes               | xfe                      | x11      | main      | http://sourceforge.net/projects/xfe/
(11 rows)

So the packages table is featuring two columns which are obviosely not needed
because they contain redundant information (if at all, with some questionable
exceptions in the case of the bugs field).  The situation with the bugs field
is a bit different in ubuntu_packages:

udd=> SELECT bugs, release, COUNT(*) FROM ubuntu_packages GROUP BY bugs, release;
                    bugs                    | release  | count 
--------------------------------------------+----------+-------
 https://bugs.launchpad.net/ubuntu/+filebug | jaunty   | 42275
 https://bugs.launchpad.net/ubuntu/+filebug | maverick | 50509
 https://bugs.launchpad.net/ubuntu/+filebug | karmic   | 45346
 https://bugs.launchpad.net/ubuntu/+filebug | lucid    | 47377
 mailto:ubuntu-users@lists.ubuntu.com       | dapper   | 29933
 mailto:ubuntu-users@lists.ubuntu.com       | hardy    | 39104
 mailto:ubuntu-users@lists.ubuntu.com       | intrepid | 41178
(7 rows)

However I would call the information redundant as well because it can be easily
calculated from the release field.

Finally I'm not convinced that te column 'tag' is efficiently kept in the
packages table.  We have the debtags table which contains all needed
information in a structured form.  What is the sense of having a copy of
the very same information in an aggregated form in as many instances as
a package has version*architectures in the packages table.  I did no
measures but intuitively that's simply a waste of space while I can
not imagine a use case where the performance gain is high enough to
justify this waste (but I might be wrong here).

Kind regards

     Andreas.

-- 
http://fam-tille.de

Reply to:

Follow-Ups:
- Re: Some thoughts about fields in UDD
  - From: Russ Allbery <rra@debian.org>
- Re: Some thoughts about fields in UDD
  - From: Lucas Nussbaum <lucas@lucas-nussbaum.net>

Prev by Date: Re: [Help] Need help of Python / I18N expert for getting DDTP translations into UDD (again)
Next by Date: Re: Some thoughts about fields in UDD
Previous by thread: Re: [Help] Need help of Python / I18N expert for getting DDTP translations into UDD (again)
Next by thread: Re: Some thoughts about fields in UDD
Index(es):
- Date
- Thread