[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Request to get Permission for Data extraction



Linus Nordberg <linus@glasklarteknik.se> writes:

> Simon Josefsson <simon@josefsson.org> wrote
> Wed, 12 Mar 2025 10:07:08 +0100:
>
>>> As pointed out in another response to your request, it might make sense
>>> for you to ask for (a copy of) the metadata kept in the database.
>>
>> Could the snapshot team make those public?
>>
>> It is harder than it should be to mirror snapshot locally.  You have to
>> screenscrape the web interface to get full data.  This creates
>> unnecessary load, so it would be nice if at least the list of filenames
>> (essentially SHA1 hashes) could be published.  Right now this
>> information is hidden.  As far as I understood earlier discussions on
>> this, that hiding is intentional (for reasons I couldn't understand).
>
> Hi Simon,
>
> Do you want to operate a full Snapshot mirror, contributing to the
> operations of the Snapshot service? Snapshot has a method for mirroring
> the farm described in [mirror/README][]. In addition to that you would
> set up postgresql for replication, to keep your db up to date with the
> primary.
>
> If not, have you tried accessing the Snapshot database using the
> 'snapshot-guest' user? The pgsql client would have to make its
> connection from a Debian machine allowed to connect to the db (on the
> primary or any of the replicas). I don't know how to compile the list of
> these machines but DSA surely do.

Hi!  My idea has been to announce my personal mirror of snapshot (in use
for a year or so already, hosted at Hetzner), and assuming it has been
operational for another year or so with good public availability, it
could be discussed if it make sense to include it as another official
Snapshot mirror.  So not a clear answer to your question, but at least
sharing my thinking.

> I don't remember the discussion about hiding information on which files
> exist in the farm. What arguments were posed for doing that?

I never understood the arguments.  The replies I got about a year ago on
IRC on made me believe that the snapshot team do not want to make the
database of filename to SHA1 hashes public, and that you did not want to
see non-official mirrors.

Perhaps this was a misunderstanding, or things have changed?

Could you make (say, a daily) export of the database publicly available?

If not, what is the reason for not making this information public?

/Simon

Attachment: signature.asc
Description: PGP signature


Reply to: