[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: About Bulk PAckage Retrieval



Hi,

On 10/05/24 at 20:07 +0200, Christoph Biedl wrote:
> Another approach may or may not be possible. If I remember correctly,
> the snapshot service stores the files using their hashsum. So if you
> know the hashsum of the file you want to retrieve - again, the Packages
> index has it -, you can request that file directly. Theoretically.
> 
> Problem however:
> 
> * That data directory is not accessbile via http/https. At least I never
>   heard about that.
> * The hash algorithm is possibly still SHA-1 while nowadays the Packages
>   index only has MD5 and SHA256.

Actually, snapshot.d.o exposes the farm ("data directory") over
http/https under /file/<sha1sum>

For example,
curl -o -
https://snapshot.debian.org/archive/debian/20240503T205304Z/pool/main/b/base-files/base-files_13.2_amd64.deb
| sha1sum
=> b45ecaf7adb0bcd2a2d7d02001420cf997a707df

And
curl https://snapshot.debian.org/file/b45ecaf7adb0bcd2a2d7d02001420cf997a707df
returns the same file.

Requests to /file/ are managed directly by Apache, without going through
the WSGI app. See
https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/production/modules/roles/templates/snapshot/snapshot.debian.org.conf.erb#L82

But I think that they are still rate-limited.

Lucas


Reply to: