[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Huge data files in Debian



Hi!

On Fri, 2015-07-17 at 12:03:36 -0400, Yaroslav Halchenko wrote:
> On Fri, 17 Jul 2015, Ole Streicher wrote:
> > But: These packages sum up to ~25 GB, with the maximal package size of
> > 3.5 GB. What is the best way to deal with them? Loosely following the
> > discussion about the Icedove icons, it is probably not a wise idea
> > ("privacy breach") to let them downloaded from a third party server; at
> > least as long as they are DFSG-free. But can (and shall) our Debian
> > servers store these files? Is 25 GB much for us or not these days?
> 
> Unfortunately it is unlikely that we (as Debian) would be able to
> afford providing generic storage and distribution of such large data
> packages.  It just wouldn't scale -- where would be a cut off? (some
> datasets we deal with in neuroimaging are already tens of TBs)

> But also it is not just about "storage" -- conventional organization
> (.orig.tar.* + .deb) with data being duplicated in both doesn't scale
> well as well.

In addition, the .deb format has some size limits. Each tar entry is
currently limited to 8 GiB (but it should be 16 EiB, which I'll be
fixing soon), and each ar member is limited to around 9536.74 MiB
(which is inherently unfixable).

And not even all current code handling .deb handle current LFS,
<https://wiki.debian.org/Teams/Dpkg/DebSupport>.

> The "ultimate" solution we are aiming for (see http://datalad.org for more
> information) is to utilize git-annex and "ship" either mere pointers to
> git-annex sources or lean (without data) git-annex repositories which
> fetch data from original (or mirrors) data providers.
> 
> Meanwhile, we (http://neuro.debian.net) have started to use git-annex to
> at least avoid bloated .orig.tar.gz, see e.g.
> http://neuro.debian.net/pkgs/python-mvpa2-tutorialdata.html
> sources: http://git.debian.org/?p=pkg-exppsy/pymvpa2-tutorialdata.git
> 
> So, .orig is in 3.0 (git) format and is just a lean annex repository.
> When building a binary packages then load gets fetched, and brought into
> .deb binary packages.

Oh, then I think this settles #720598 (requesting the removal of the
«3.0 (git)» source format) for now, and knowing that it's being used
I'll just close the request. :)

Thanks,
Guillem


Reply to: