Re: partial snapshot mirror amd64/bullseye/bookworm
On 01/03/21 at 22:41 +0000, Paul Wise wrote:
> On Mon, Mar 1, 2021 at 5:25 PM Holger Levsen wrote:
> > > How would the mirroring work?
> > to be discussed, but my raw idea would be to use rsync with excluding the years
> > before 2015 or 2017. or can't this work? 8-)
> That won't work, since the filesystem storing the data is hash (SHA1)
> based, so you need to look up hashes for the relevant data in the
> database and then copy only those files.
For https://trends.debian.net/, I have a local mirror of snapshot.d.o
(with sources only, and only for specific versions). The code used to
create it is available in https://salsa.debian.org/lucas/dhistory/-/blob/master/dhistory
- queries the snapshot DB to identify the files and hashes for each
- fetches and analyses Sources files to identify (source,version) of
interest, and thus hashes to transfer
- transfers those hashes from snapshot.d.o to my own machine using rsync
The query used for the first step is:
psql -At service=snapshot-guest -c "select row_to_json(t) from
(select srcpkg.name as source_name, srcpkg.version as source_version,
file.name as file_name, file.hash as file_hash, file.size as file_size,
node_with_ts.first_run as file_first_run, node_with_ts.last_run as file_last_run
inner join file_srcpkg_mapping on srcpkg.srcpkg_id = file_srcpkg_mapping.srcpkg_id
inner join file on file.hash = file_srcpkg_mapping.hash
inner join node_with_ts on node_with_ts.node_id = file.node_id
inner join archive on node_with_ts.archive_id = archive.archive_id
where archive.name = 'debian') t"
That's the query that would have to be adapted for binary packages and
for a specific date range.