Re: Processing every sid main Sources.xz from snapshot.d.o
On Tue, 01 Nov 2016, Ian Jackson wrote:
> I would like to do some statistical archaeology. I think the easiest
> way would be to download every available Sources.xz. (I'm only
> interested in sid main.)
>
> I don't actually need the whole file. Specifically, for each
> snapshot, I just want to know: how many entries are there in Sources
> and how many of those entries have a `Dgit' field. I will want to
> repeat this exercise in an incremental way, to generate a pretty
> graph.
>
> Is there a better way of doing this than downloading every
> Sources.xz ?
(a) find out what their sha1sum is and (b) fetch them from the filesystem
on lw08?
re a:
psql service=snapshot-guest
Maybe something like
SELECT * FROM
file
JOIN node ON file.node_id = node.node_id
JOIN mirrorrun ON node.first=mirrorrun.mirrorrun_id
JOIN archive ON mirrorrun.archive_id = archive.archive_id
JOIN directory ON node.parent = directory.directory_id
WHERE
archive.name='debian'
AND path = '/dists/sid/main/source'
AND file.name = 'Sources.gz'
LIMIT 4 ;
re b:
/srv/snapshot.debian.org/farm
Cheers,
--
| .''`. ** Debian **
Peter Palfrader | : :' : The universal
https://www.palfrader.org/ | `. `' Operating System
| `- https://www.debian.org/
Reply to: