Re: Huge data files in Debian
On Sat, Jul 18, 2015 at 10:41:44AM +0200, Ole Streicher wrote:
> Pedro Larroy <email@example.com> writes:
> > Wouldn't a p2p system scale better than any server based solution?
> > Also in regards to cost...
> I am a bit afraid about the long-term availability of any non-standard,
> new solution: Anything we implement now (and used for Debian 9) should
> be supported for the whole LTS time. If Debian 9 comes 2018 or so, it
> should run until at least 2023! Given the fact that many solutions
> disappear as fast as they show up, I would be quite conservative here.
> Best regards
This is why I suggested OpenAFS... It's been around since before Debian
was, is in heavy use by large science sites (CERN, Standford, MIT, etc),
and the debian OpenAFS packages work quite well.
The problem with distributed p2p systems is debugging them when the data
you want does not show up. AFS is a distributed heirarchical system and
we could probably get the p2p functionality by using the existing systems
that are used to replicate the debian archive.
Really though, I'd just like to have an apt/sources.list line like:
deb /afs/data.d.o/ wheezy main contrib non-free astro neuro semiconductor
and then be able to have the 'big-data' Astronomy, Neurology, and data
files to do open-source analysis and testing of 10nm semiconductor chips
The other way to look at this is different big-data end-users can agree
to manage different types of filesystem (say Cern decides to publish their
LHC data files from the ATLAS experiment via debian packages on
/afs/cern.ch/atlas/debian ), and then maybe some Neurology guys decide to
use git-annex, and some astro folks use tahoe-lafs.
The point for Debian is I think the data 'archive' must be accessible vi
a regular 'file' path using packages in the main archive that provide
the filesystem mount.
So what solutions besides AFS are:
a) available as debian packages
b) have at least a 5 year history? (afs is pushing 25 years or more)