Re: Debian snapshot.debian.org replica on the Amazon cloud?

On Fri, 2013-09-27 at 14:30 -0400, James Bromberger wrote:

> Couple of questions (while I am waiting to come home) - the underlying 
> data set - is it a massive symlink farm for the files?

The underlying data is a hash-addressed (SHA-1) filesystem overlayed on
a conventional filesystem (ext3/etc). The first two characters of the
hash are used to create two sublevels of directories to avoid filesystem
directory limitations. So an empty file would be stored here:


The filenames and other metadata are stored in the postgresql database.

> Whats the current growth rate?

Some information about that is here (dsa-guest, no password):


> Are there any individual files greater than 5 TB?

I think the largest file ever in the Debian archive has probably been
1-2GB only, I doubt we will ever get files of that size.

> Where is the data currently?

On stabile (hardware issues) and sibelius:


> Can we get it loaded onto a set of HDDs for shipping into AWS, or
> would you want to sync that all online over a period?

I am not part of the Debian sysadmin team but I guess that could be a
possibility. I think this was done with the initial two-machine setup.

> Just want to work out what would be required. I see the costs would be 
> around US$1600/month for this (hosted in the US) - around US$20k/year, 
> so I just need to convince the company of this and get approval.

Wow, I didn't expect it would cost that much.

> Would we be able to get an AWS logo in the footer, and acknowledgment on 
> the front page, etc? I need to build a justification for something this big.

The sponsors of the current system are acknowledged on the front page of
the site and in the machines list (linked above). I expect if Amazon
were to sponsor a replica we would add them there too. I'm not sure
about a logo in the footer though, that would be up to folks who run it.


In terms of justification, snapshot.d.o is an essential tool for
developers (bug bisection etc) and users (testing etc). In addition it
is essential for two Debian projects; reproducible builds and the Debian
derivatives patch generation.




