[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Huge data files in Debian



On Fri, Jul 17, 2015 at 09:38:06PM +0200, Jakub Wilk wrote:
> * Ole Streicher <olebole@debian.org>, 2015-07-17, 10:34:
> >But: These packages sum up to ~25 GB, with the maximal package
> >size of 3.5 GB.
> 
> Well, that's a lot. Just as data points:
> 
> * The biggest binary package currently in the archive,
> ns3-doc_3.17+dfsg-1_all.deb, is only ~1GB.
> 
> * The biggest source package, nvidia-cuda-toolkit_6.0.37-5, is only
> ~1.5GB.
> 
> 
> I'm afraid you might need to wait for the advent of data.d.o:
> https://lists.debian.org/87tzgm6yee.fsf@vorlon.ganneff.de
> (mind the typo: s/2 weeks/10 years/)
> 

My first thought was "well, can all of us science-type users 
agree to host something like /afs/data.d.o/", and then I saw 
the following:

On Fri, Jul 17, 2015 at 02:03:54AM -0700, Afif Elghraoui wrote:
> Package: wnpp
> Severity: wishlist
> Owner: Afif Elghraoui <afif@ghraoui.name>
> X-Debbugs-Cc: debian-devel@lists.debian.org
>
> * Package name    : ori
>   Version         : 0.8.1
>   Upstream Author : Stanford University <orifs-devel@lists.stanford.edu>
> * URL             : http://ori.scs.stanford.edu/
> * License         : ori (MIT-like)
>   Programming Lang: C++
>   Description     : secure distributed file system
>
> Ori is a distributed file system built for offline operation and empowers
> the user with control over synchronization operations and conflict
> resolution.
> History is provided through lightweight snapshots and users can verify that
> the history has not been tampered with. Through the use of replication,
> instances can be resilient and recover damaged data from other nodes.

So is there any sort of reasonable internet-scale distributed 
filesystem in use that might actually work for this?

It seems a bit silly to keep building new systems to handle large
data sets we'd really rather have around for a long time. There is
also this nebulous 'proof of storage' blockchain concept I hear about
every once in awhile from bitcoin-type circles. I'd be quite happy
to dedicate a few terabytes to store DFSG science data if there was
a micropayment based system to cover the bandwidth costs.


Reply to: