[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#775163: apt pigs out in /var, particularly with multi-arch



reopen 775163
quit

On Mon, Jan 12, 2015 at 10:34:04AM +0100, David Kalnischkies wrote:
> On Mon, Jan 12, 2015 at 09:36:00AM +0100, Michael Vogt wrote:
> > On Sun, Jan 11, 2015 at 09:40:20PM -0800, Elliott Mitchell wrote:
> > > I've ended up examining how much space programs are using in /var, and
> > > APT is the top pig, using close to half of /var as /var/lib/apt/lists,
> > > one factor does appear to be exasperating this, `dpkg` has 5 foreign
> > > architectures.
> 
> 5 architectures? That is a lot??? I presume you are a heavy cross builder?

I dunno, depends on what one considers "a lot" and how much
cross-building it takes to qualify as a heavy cross-builder.  I will note
several of the 5 are subarchitectures, not wholly separate architectures.

> If you have repositories you don't want to get the data for a specific
> architecture, consider the sources.list [arch-=]-syntax: see manpage.

Useful to know, but this doesn't have much impact.  The lists files for
pure Debian testing and stable are by far the biggest users in
/var/lib/apt.


> > > Trying a few compression methods:
> > > 
> > > 426248  lists
> > > 114580  lists.gz
> > > 90868   lists.bz2
> > > 85648   lists.lzma
> > > 86532   lists.xz
> > > 
> > > Nearly all of this space is being used for the Packages files.  Merely
> > > compressing them would be a rather major improvement.  The main Debian
> > > testing file is the biggest of these.
> > 
> > You can use the configuration option 
> > """
> > Acquire::GzipIndexes "1";
> > """
> > to keep the indexes compressed on disk. You trade the speed for
> > building the mmap cache with the size of the data on disk. 
> 
> It isn't just startup, other parts of runtime will access it to, so it
> can/will be slow all around. Searching for example, but also any action
> downloading a package (because of the Filename: field, among others).
> It is the "price" you pay for Debian having such a huge archive. You can
> freely delete the /var/cache/apt/lists directory through if you are done
> working with apt for the moment. This is usually done on space
> constraint embedded systems for example. Just remember to do a 'apt-get
> update' before you use apt the next time and apt will recreate the
> directory and its content.
> 
> (Note btw that the option mentioned above keeps the compressed files it
> downloaded, so it isn't compressing the entire directory, which means
> less savings - note also that some compression algorithms are more
> cpu/memory/time hungry than others while they are uncompressing.)

Given how often `apt-get update ; apt-get upgrade` *should* be run
(weekly for workstations, perhaps monthly for embedded systems), nuking
the lists files is a distinctly losing proposition.  Depending upon what
operations are most used, an index into the compressed files could
recover most of the speed.

I also tried the above option before submitting the bug report, the space
savings were trivial.

> > Note that this option works best with later apt versions (1.0.9.2 or
> > later) where this option supports all compressions that apt supports
> > (the older versions only support gzip).
> 
> As there isn't much else we can do about it, I am closing with that
> version number.

There are other things that could be done which would result in major
reductions in APT's usage of space in /var.  Looking at
/var/lib/apt/lists:

The <repo>_dists_<dist>_<component>_binary-<arch>_Packages files have a
*lot* redundancy amoung them.  While the Filename, Size, and checksums
fields will differ between architectures, nearly all other fields won't
vary between architectures.  Making a common
<repo>_dists_<dist>_<component>_common_Shared file and having the
binary-<arch> files merely contain the fields that differ from the common
file would present large savings for people who have support for foreign
architectures present (at least half per file, likely closer to
two-thirds).  Some additional savings could be had by taking advantage of
redundancies in the source_Sources files.

I also notice pkgcache.bin and srcpkgcache.bin in /var/cache/apt.  These
two files appear to derive from other files, does keeping these on
persistent storage really speed up any operations?  (I didn't notice any
difference after deleting them)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         EHeM+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445


Reply to: