Non-normalised field Provides in UDD table (Was: New feature for 0.6.103)

To: debian-blends@lists.debian.org
Subject: Non-normalised field Provides in UDD table (Was: New feature for 0.6.103)
From: Andreas Tille <andreas@an3as.eu>
Date: Wed, 11 Apr 2018 21:33:54 +0200
Message-id: <[🔎] 20180411193354.vm2rx7kanz7bu36o@an3as.eu>
In-reply-to: <[🔎] 87o9ipspza.fsf@debian.org>
References: <[🔎] 20180410152747.3jsookuedcr5qd4k@an3as.eu> <[🔎] 87604zt6gi.fsf@debian.org> <[🔎] 20180410161125.qhm4b32ycxdybe2e@an3as.eu> <[🔎] 871sfmuayw.fsf@debian.org> <[🔎] 20180410194403.rxsbtwf5vsl5yvsr@an3as.eu> <[🔎] 87tvsis01t.fsf@debian.org> <[🔎] 20180411105839.b37wvzeybkxijhjj@an3as.eu> <[🔎] ytzr2nm56jr.fsf@news.ole.ath.cx> <[🔎] 20180411151704.74ovfbdl573egeos@an3as.eu> <[🔎] 87o9ipspza.fsf@debian.org>

Hi Ole,

On Wed, Apr 11, 2018 at 05:49:13PM +0200, Ole Streicher wrote:
> Andreas Tille <andreas@an3as.eu> writes:
> > It seems the usage of GENCONTROL_OPTS in Makefile does not work yet.  I
> > did [...]
> 
> > +GENCONTROL_OPTS = --udd
> 
> IMO there is an export missing. Otherwise GENCONTROL_OPTS is only local.
> You could try
> 
> $ GENCONTROL_OPTS="--udd" /usr/share/blends-dev/blend-gen-control  -r "unstable" -S -t
> 
> to check whether it retrieves from UDD then; for me it works.

I'll check with export ...

> >   SELECT packages.package, packages.provides, packages.architecture, packages.version FROM packages WHERE  packages.provides like 'libodil%' ;
> >
> > is just empty, but you get:
> >
> > udd=# SELECT DISTINCT packages.package, packages.provides, packages.version FROM packages WHERE  packages.provides like '%libodil%' ;
> >    package    |           provides            | version 
> > --------------+-------------------------------+---------
> >  libodil0-dev | libdcmtkpp-dev, libodil-dev   | 0.7.3-1
> >  libodil-dev  | libdcmtkpp-dev, libodil0-dev  | 0.8.0-4
> >  libodil-dev  | libdcmtkpp-dev, libodil0-dev  | 0.9.1-1
> >  libodil-doc  | libdcmtkpp0-doc, libodil0-doc | 0.8.0-4
> >  libodil-doc  | libdcmtkpp0-doc, libodil0-doc | 0.9.1-1
> > (5 Zeilen)
> >
> > This means you can not expect an exact match on the package name since
> > provides can have a list of packages (which again proves my point that
> > UDD queries become always more complex than expected. :-P)
> 
> Huh! That is ugly! Who made this???

Probably the person who had the intention to add control file
information verbosely to a database table.  I had discussed things like
this years ago (debian-qa@lists.debian.org with [UDD] in subject is the
right place to discuss this).  You could even argue that someonw is
interested in the sequence of the dependencies (not that I personally
would like this argument).  I was told UDD was *never* intended to be a
normalised database and you always find arguments pro and contra
normalisation.

> At least this contradicts to
> everything I learned about databases: if there may be more than one
> provides for a package, there should be an extra table for this
> relation. Same for other dependencies (depends, recommends etc.).

That's called normalisation and for *our* application this database
layout would be optimal.  That's true and I would have prefered this.
However, at the time when UDD was invented I was not even aware that
I would once use it for application in Blends.

> How does one query for "give me all packages that provide (or depend on)
> xyz"?

See below.

> With "like", you basically need to retrieve and parse all "provides"
> fields that are in the database with no chance to index it...

Yes.

> and this
> has to be done for all O(1000) packages of Debian Science. Sounds really
> slow,

For calculating basically static content it was usually not *too* slow
but I would not base some dynamic web application on top of it.

> especially when you want to get an authorative answer (your
> example would also find f.e. a package calibodil-dev, if that would
> exist).

See below.

> IMO the clean solution would be to have a "package-dependencies" table
> with the columns package, type, dependencies (or one table for each
> dependency type), if that does not exist yet (I must admit that I may
> have not searched carefully enough yet).

These tables do not exist yet.  But UDD code is on Salsa[1] - all SQL
definitions are in dir sql/.

> But solving this on client side
> sounds like an ugly hack to me.

Yes.  Didn't I told you that server side queries tend to be complex?

> Or do I miss something here?

The good thing about PostgreSQL is that it has really nice features.
I'm using arrays to solve this.  To get the authoritative answer you
were seeking above just do:

 SELECT * FROM (
   SELECT DISTINCT packages.package,
                   regexp_split_to_table(packages.provides, E',\\s*') AS provides_normalised,
                   packages.version
     FROM packages
     WHERE  packages.provides like '%libodil%'
 ) tmp WHERE provides_normalised = 'libodil0-dev';

And now you understand why I'm using in webtools blendstasktools.py
queries where in addition to a simple list another argument is
created using List2PgArray which contains the same package list but
in PostgreSQL array syntax.  You get examples for the usage in

  CREATE FUNCTION blends_query_packages(text[], text[], text) RETURNS SETOF record

where the construct

    LEFT OUTER JOIN (
      SELECT DISTINCT regexp_replace(package_version, E'\\s*\\(.*\\)', '') AS package, array_agg(enhanced_by) AS enhanced FROM (
        SELECT DISTINCT package AS enhanced_by, regexp_split_to_table(enhances, E',\\s*') AS package_version FROM packages
         WHERE enhances LIKE ANY( $2 )
      ) AS tmpenh GROUP BY package
    ) enh ON enh.package = p.package

is used to resolve the Enhances field.  With "LIKE ANY ( array )"[2] you
can solve the issue.  If you want me to provide this as solution for
blends.py feel free to ask me.  I might find some time for this until
tomorrow evening (but I don't know whether you prefer something totally
different).

Hope this helps

     Andreas.

PS:  The main reason to use UDD was the "Architecture: any" feature.  As
far as I can see that's not implemented yet and left for a future
release.  That's perfectly fine since the current implementation is even
now better than the old Perl script.  I just want to make sure that I
have tested all new features.

[1] https://salsa.debian.org/qa/udd
[2] https://postgres.cz/wiki/PostgreSQL_SQL_Tricks#LIKE_to_list_of_patterns

-- 
http://fam-tille.de

Reply to:

Follow-Ups:
- Re: Non-normalised field Provides in UDD table
  - From: Ole Streicher <olebole@debian.org>

References:
- Re: New feature for 0.6.103
  - From: Andreas Tille <andreas@an3as.eu>
- Re: New feature for 0.6.103
  - From: Ole Streicher <olebole@debian.org>
- Re: New feature for 0.6.103
  - From: Andreas Tille <andreas@an3as.eu>
- Re: New feature for 0.6.103
  - From: Ole Streicher <olebole@debian.org>
- Re: New feature for 0.6.103
  - From: Andreas Tille <andreas@an3as.eu>
- Re: New feature for 0.6.103
  - From: Ole Streicher <olebole@debian.org>
- Re: New feature for 0.6.103
  - From: Andreas Tille <andreas@an3as.eu>
- Re: New feature for 0.6.103
  - From: Ole Streicher <olebole@debian.org>
- Re: New feature for 0.6.103
  - From: Andreas Tille <andreas@an3as.eu>
- Re: New feature for 0.6.103
  - From: Ole Streicher <olebole@debian.org>

Prev by Date: Re: New feature for 0.6.103
Next by Date: Re: Non-normalised field Provides in UDD table
Previous by thread: Re: New feature for 0.6.103
Next by thread: Re: Non-normalised field Provides in UDD table
Index(es):
- Date
- Thread