[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Description-less packages file



On Mon, Feb 27, 2012 at 12:20:31AM +0000, Stuart Prescott wrote:
> I think your changes are necessary so that the derivatives_descriptions 
> table, which we are currently not populating, can eventually be properly 
> populated, and there is some benefit in having the same schema for each of 
> the *_descriptions tables even if the debian_descriptions table will always 
> have "debian" in that column.

No, even the (debian_)descriptions table has (on blends.debian.net):

udd=# SELECT distribution, component, release, count(*) from descriptions where language = 'en' group by distribution, component, release  order by distribution, release, component;
   distribution   |       component       |         release          | count 
------------------+-----------------------+--------------------------+-------
 debian           | contrib               | experimental             |    12
 debian           | main                  | experimental             |  2271
 debian           | non-free              | experimental             |    39
 debian           | contrib               | sid                      |   237
 debian           | main                  | sid                      | 37198
 debian           | non-free              | sid                      |   518
 debian           | contrib               | squeeze                  |   189
 debian           | main                  | squeeze                  | 28662
 debian           | main/debian-installer | squeeze                  |  1043
 debian           | non-free              | squeeze                  |   427
 debian           | main                  | squeeze-proposed-updates |   248
 debian           | contrib               | squeeze-security         |     1
 debian           | main                  | squeeze-security         |  1032
 debian           | main                  | squeeze-updates          |   163
 debian           | contrib               | wheezy                   |   216
 debian           | main                  | wheezy                   | 35487
 debian           | non-free              | wheezy                   |   450
 debian           | main                  | wheezy-proposed-updates  |   115
 debian-backports | contrib               | squeeze                  |     8
 debian-backports | main                  | squeeze                  |  1247
 debian-backports | main/debian-installer | squeeze                  |     4
 debian-backports | non-free              | squeeze                  |    32
(22 Zeilen)

so even if we ignore derivatives_descriptions we need this column (but
for sure it becomed much more obvious if we take derivatives into
consideration).
 
> That said, I don't believe this change will actually help the problem you 
> are seeking to address: "distribution" should be uniformly "debian" in the 
> descriptions generated by the packages gatherer for all Packages files 
> coming from Debian. Descriptions imported by the ddtp gatherer itself will 
> also always have "debian" at this stage.

Well, IMHO ddtp gatherer should reflect what packages table has.  On
official udd.debian.org it is:

udd=>  SELECT distribution, count(*) from packages group by distribution;
      distribution       | count  
-------------------------+--------
 debian                  | 942393
 debian-backports        |  26766
 lenny-volatile-proposed |    205
 debian-backports-sloppy |    408
 lenny-volatile          |    127
(5 rows)

So I'm not fully sure whether we can go with 'debian' only for
distribution in the descriptions table.  The only way to prevent "other"
distrbutions like debian-backports from injecting their descriptions
would be to drop

   descriptions-table: descriptions

entry from the "debian-backports-squeeze:" section in the config file.
I just noted this problem once I was running the gatherer on
blends.debian.net and detected that there are way to less en
descriptions for squeeze.  The reason was that debian-backports-squeeze
was imported after debian-squeeze and the descriptions matching
  release='squeeze' and language='en'
were replaced by this later import because distribution was not regarded.

> The data from squeeze vs squeeze-backports should be differentiated in the 
> "release" column, not in the distribution column.

This is what I thought in the first place but I noticed that it is
handled that way in the packages / sources table.  I do not think we
should break the logic of these tables (even if I have some positive
feeling for your arguing in fact).

> Looking at config-
> org.yaml, I suspect that the real problem is that the "release" key for 
> squeeze-backports is incorrectly set:
> 
> debian-backports-squeeze:
>   [...]
>   release: squeeze
> 
> if set to "squeeze-backports" then the release column will instead 
> distinguish the translations from one-another in the (package, release, 
> component) tuple. This is probably a simple copy+paste error from the 
> squeeze release; fixing this should also fix the translation clobbering 
> problem.

I put Lucas in CC whether this is really the case.  In any case this
should be discussed here.

Kind regards

     Andreas.

PS: I just noticed that there is some other issue with the ddtp importer
    left.  It is very frequently claiming duplicated data sets.  Need
    to track this down in the next couple of evenings.

-- 
http://fam-tille.de


Reply to: