[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Request to get Permission for Data extraction



Linus Nordberg <linus@glasklarteknik.se> writes:

>>> I don't remember the discussion about hiding information on which files
>>> exist in the farm. What arguments were posed for doing that?
>>
>> I never understood the arguments.  The replies I got about a year ago on
>> IRC on made me believe that the snapshot team do not want to make the
>> database of filename to SHA1 hashes public, and that you did not want to
>> see non-official mirrors.
>>
>> Perhaps this was a misunderstanding, or things have changed?
>>
>> Could you make (say, a daily) export of the database publicly available?
>>
>> If not, what is the reason for not making this information public?
>
> I think this is a question of how to distribute the database and not
> about keeping any information hidden from the public.

Thanks for dividing up the questions into two different parts.  I
understand distributing the database is a technical issue and that
someone needs to work on it for it to happen, and that is a scarce
resource.

However the messaging I got (perhaps incorrectly) was that the snapshot
team did not want the SHA1 hashes of its files to become public, except
presumably indirectly via screen scraping that is hard to prevent.

Can you confirm that one way or the other?

I don't understand the reason for hiding the SHA1 hashes, but I've
respected this and not announced my unofficial mirror as a consequence.

> With a PostgreSQL on-disk size of ~100G, how should a daily export be
> made available?
>
> Also, when you say "publicly available" do you mean available to the
> internet at large or to Debian DD's? Because the former would probably
> need some kind of rate limiting while the latter is supposedly already
> the case (did you try 'snapshot-guest'?).

I mean publicly available to everyone.

If technical changes are hard to come by, perhaps we can side-step that
if the snapshot team confirms that they have no problem with me making
the list of SHA1 hashes of packages from snapshot available.  Of course,
I would not claim that you endorse the list in any way, if that is not
the case.  Then the technical issue of distribution becomes my problem.
I never got the 'snapshot-guest' approach to work for bigger data
exports (it seems lw08 have some rate or resource limiting in place, and
I wouldn't want to rely on a mechanism not available to non-DDs), but it
was a couple of months since I tried last.

/Simon

Attachment: signature.asc
Description: PGP signature


Reply to: