[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Processing every sid main Sources.xz from snapshot.d.o



On Tue, 01 Nov 2016, Ian Jackson wrote:

> I would like to do some statistical archaeology.  I think the easiest
> way would be to download every available Sources.xz.  (I'm only
> interested in sid main.)
> 
> I don't actually need the whole file.  Specifically, for each
> snapshot, I just want to know: how many entries are there in Sources
> and how many of those entries have a `Dgit' field.  I will want to
> repeat this exercise in an incremental way, to generate a pretty
> graph.
> 
> Is there a better way of doing this than downloading every
> Sources.xz ?

(a) find out what their sha1sum is and (b) fetch them from the filesystem
on lw08?


re a:

psql service=snapshot-guest

Maybe something like
 SELECT * FROM
   file
   JOIN node ON file.node_id = node.node_id
   JOIN mirrorrun ON node.first=mirrorrun.mirrorrun_id
   JOIN archive ON mirrorrun.archive_id = archive.archive_id
   JOIN directory ON node.parent = directory.directory_id
 WHERE
   archive.name='debian'
   AND path = '/dists/sid/main/source'
   AND file.name = 'Sources.gz'
 LIMIT 4 ;


re b:
 /srv/snapshot.debian.org/farm

Cheers,
-- 
                            |  .''`.       ** Debian **
      Peter Palfrader       | : :' :      The  universal
 https://www.palfrader.org/ | `. `'      Operating System
                            |   `-    https://www.debian.org/


Reply to: