[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Request to get Permission for Data extraction



Sorry for the late response.

Thank you for showing me the snapshot metadata. With this site, I do not need the changelogs and package lists located under /dists/<stable, testing, unstable>/. However, I still require information that I can only find in the .debian.tar.gz files (all dependencies, maintainer, and uploaders of the package).

I have estimated the amount of data I need to fetch from the Debian Snapshots. It comes to up to 40 GB, if I calculate generously. Therefore, downloading 184 TB of data just to extract 40 GB would be overkill. It might be less invasive to simply scrape the necessary data.

Am 2025-03-20 10:49, schrieb Linus Nordberg:
Richard Reinick <richard.reinick@stud.hs-merseburg.de> wrote
Wed, 12 Mar 2025 10:21:28 +0100:

I need the *.debian.tar.gz and *.dsc files from pool/main/* for all
versions of the respective packages, in other words the metadata files
of the packages. Additionally, I need information on when a package
was part of the stable, testing, and unstable branches of a Debian
version. For this, I require the changelogs and the package lists
located under /dists/<stable, testing, unstable>/.

I am unsure about the total volume of data I need to download.

If possible, I would gladly request a copy of the database.

The database in the context of the snapshot service is the snapshot
metadata kept in an SQL database, currently about 100G (on disk). The
actual packages are kept in what's called the farm which currently
weighs in at 184T.
Cf. https://salsa.debian.org/snapshot-team/snapshot/-/blob/master/README

The snapshot metadata alone might help you but I suspect that you will
want more. Are you aware of https://metasnap.debian.net/?


Reply to: