[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#963887: UDD: 'duck' importer broken since 2020-05-25



On Tue, Jun 30, 2020 at 09:19:31AM +0200, Baptiste BEAUPLAT wrote:
> On 6/29/20 11:34 PM, Raphael Hertzog wrote:
> >> The duck worker has to process around 460000 urls (only counting
> >> Homepage) in less than 24h.
> > 
> > How do you get to that figure? We don't have that many source package
> > and even if you consider multiple URL for each source package due to
> > changes over time (in multiple releases), that makes way too many URLs
> > per source package.
> 
> Err, sorry about that. That figure is the result of:
> 
> $ curl -s
> http://deb.debian.org/debian/dists/unstable/main/source/Sources.gz |
> zgrep -v Homepage: | sort -u | wc -l
> 458804
> 
> Which is obviously wrong. Here is the real number:
> 
> $ curl -s
> http://deb.debian.org/debian/dists/unstable/main/source/Sources.gz |
> zgrep Homepage: | sort -u | wc -l
> 26250

Just a note before you head toward implementing that: the Homepage field
is similar to Section, in the way that it can also be specified in the
binary paragraphs, not just the source paragraphs.
You can see that as the Homepage field is present in the DEBIAN binary
control field of the .debs, and clearly that value might be different
than the one in Homepage of the .dsc.

So please, look harder for Homepage, not just in the first paragraph of
d/control ;)

-- 
regards,
                        Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540      .''`.
More about me:  https://mapreri.org                             : :'  :
Launchpad user: https://launchpad.net/~mapreri                  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-

Attachment: signature.asc
Description: PGP signature


Reply to: