[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Description-less packages file



[Joerg in CC in case he might not read debian-qa,
 Lucas in CC because I was somehow expecting some answer from him
 in this thread]

Hi Stuart,

On Mon, Feb 06, 2012 at 11:26:11PM +0000, Stuart Prescott wrote:
> >    1. Provide the missing information in the Packages.gz files
                                         ^ about MD5 sums
> >       anyway.  Joerg, I have no idea how complex to implement
> >       this might be or what chances to break something might
> >       exist.
> >    2. We move English translations from Translation-en.bz2
> >       to the packages table making sure that all existing UDD
> >       applications will work immediately again.
> >    3. We drop long_description field from packages table now
> >       and *calculate* the md5 sums from long_escription for those
> >       releases where it is missing and keep all long_descriptions
> >       inside the ddtp table.
> 
> My feeling is that our long term aim should be to have the long description 
> only in the ddtp table. This is a slightly-more-normalised form for the 
> database which will help reduce the size of the tables and, since the long 
> description is unused in most queries to UDD, that will help with 
> performance. It's also a data structure that, in the long term, more closely 
> reflects the data sources being included which has been a general UDD 
> principle over the years.

I perfectly agree here.  This excludes option 2 which would have been
probably most easily to implement but I'm happy that at least one other
developer does not like this kind of quick workaround

> So, if that's where we want to end up in the long run, I reckon that we 
> should just do it now.

Sounds good for me.

> We only need a small amount of code to get it to work 
> for squeeze (and lenny, for however long that will remain in UDD) and in ~2 
> years' time we can drop that too.

For the purpose of finding a sane decision I was trying to verify how
strict those people who are providing the packages files (ftpmaster /
anybody else?) might consider feeding the description_md5 even into
older releases.  In noticed that this is currently not the case but
waiting for a new release is not option UDD wise.  So we just need to
sort out whether it is much work to get the needed data straight into
Packages files.  If this is not possible we should go with the temporary
way to feed UDD.

> I'm happy to look at the packages gatherer and have it (optionally) feed in 
> the long description and the appropriate md5 into ddtp.

I just commited the ddtp gatherer (including changed table structure
which drops unneeded fields like distribution and version).  This also
gathers English descriptions (as well as any other language).

When doing previous experiments with MD5 sums in the gatherer I found
out that the description_md5 is calculated like this:


               SELECT md5(full_description || E'\n' ) AS description_md5,
               full_description FROM (
                 SELECT DISTINCT
                   description || E'\n' || long_description AS full_description
                  FROM packages)


For sure this needs to be verified with current description_md5 but it
might be a nice hint what to do.

> Things that use the description from UDD will need to learn to get them from 
> ddtp rather than packages -- I'm happy to help with that but I don't know 
> what uses the description at all anyway.

I'm using the field in the Blends sentinel and I can perfectly cope with
this change (ist actually simplifies things if implemented as you
proposed).

I have no idea what other applications might use this (as I suspected in
my previous mail I'm afraid packages.debian.org is using it at least).
This somehow brings up a more general requirement:  We need better
documentation what services are using UDD.

Kind regards

         Andreas.

-- 
http://fam-tille.de


Reply to: