[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#833388: ITP: metaphlan2 -- Metagenomic Phylogenetic Analysis



On 08/03/2016 09:00 PM, Andreas Tille wrote:
>      2b) Do the conversion of the format in postinst at the expense
>          of users time which is acceptable since the package usually
>          unpacks on high performance machines and not so many
>          installations which means bandwidth and disk space on Debian
>          mirrors should be saved here instead of users machine
> 
>          Source tarball 256MB + binary package ~250MB (estimated)

Personally, I think that'd probably be the best solution, at least
as long as there are not too many updates to the package. I'm
thinking that if the data changes once or twice a year, that'd be
OK. If it's twice a week, then I think the only realistic solution
would be 3b).

There are some large data packages in sid already though, even
reaching the sizes you describe, but if you can avoid this,
especially for low-popcon packages, I think having the user's
computer do a little more work in postinst is a reasonable trade-
off here.

For reference, the top 5 sorted by deb size:

Package                   deb Size (GiB)      Installed Size (GiB)
---------------------------------------------------------------------
flightgear-data-base      1.06257             1.50826
freefoam-dev-doc          0.84636             1.49562
redeclipse-data           0.72715             0.832576
0ad-data                  0.540366            1.4238
libpcl1.7-dbg             0.530659            0.578442

Top 5 sorted by installed size:

Package                            deb Size (GiB) Installed Size (GiB)
----------------------------------------------------------------------
linux-image-4.6.0-1-rt-amd64-dbg   0.409186       3.09527
linux-image-4.6.0-1-amd64-dbg      0.410594       3.09287
linux-image-4.5.0-2-amd64-dbg      0.393085       2.87757
flightgear-data-base               1.06257        1.50826
freefoam-dev-doc                   0.84636        1.49562

Shell snipped I used to get this data:
awk '/^Package:/ { pkg = $2; }
     /^Installed-Size:/ { is = $2; }
     /^Size:/ { print pkg, $2, is }' \
     < /var/lib/apt/lists/*_debian_dists_sid_main_binary-amd64_Packages \
  | sort -k3 -n \
  | awk '{ print $1, $2 / 1024.0 / 1024.0 / 1024.0, $3 / 1024.0 / 1024.0 }' \
  | tail -n 5 \
  | tac

Using a similar snippet, I could determine that there are 34
packages with deb size larger than 200 MiB in the archive at
the moment; 51 larger than 150 MiB and 88 larger than 100 MiB.
(This does not include -dbgsym packages in the debug section.)

>      (possibly be upstream can be convinced
>      to provide a *.bz2 tarball for maximum compression).

Please don't use bz2 anymore. It's really slow and doesn't do any
better than e.g. xz. (There's a reason why Debian migrated away
from it.) If you convince upstream to provide a better tarball,
please suggest a better algorithm. (The compression level with xz
does play a role when it comes to speed, but there you can choose
a reasonable trade-off. The memory requirements of xz should be
irrelevant here, because I can't see the software you're
describing being used on an extermely low-memory system.)

>      3a) Use postinst [to download stuff]

I don't think that's a good idea, simply because of the amount of
data. Downloading things in postinst is OK if it's a couple of
megabytes (see e.g. ttf-mscorefonts-installer in contrib), but
250 Megs?

Also, I could imagine that in a Lab setup where this might be
useful, you'd maybe want to have an air-gapped computer and use
apt-offline to update it. And this would definitely break that
kind of thing.

>      3b) Inform user to call a download script manually to do not
>          block apt for a longer time dealing with potential download
>          problems.

That would be the only option if you don't bundle the data with
the package. In that case, maybe patch the binary to detect that
situation and inform the user they should still do this?

But as I said: if your package doesn't change that often, I don't
think that adding 2x 250 MiB (source and arch=all, right?) is
necessarily excessive.

Regards,
Christian


Reply to: