[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg memory usage



Steve Dunham writes ("Re: dpkg memory usage"):
> John Goerzen <jgoerzen@complete.org> writes:
> > I was upgrading packages on my 64 meg system today ant noticed:
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
> > 24785 root      18   0 12680  12M   568 S       0  0.1 20.0   5:36 dpkg
> > Yes, that's almost 13 megs used by dpkg, and 20% of my RAM.
> > That also is 4 megs more than the TOTAL amount of RAM in some computers I
> > work with.
> 
> > So...why must dpkg use almost as much memory as XFree86 itself, and MORE
> > than Netscape does at times?

The main reason that dpkg is so large is because it has loaded into
core a complete list of all files it has installed on your system and
which package(s) they came from.  This data really is that large.

When processing, dpkg does tend to grow somewhat, and particularly to
use up swap rather than real memory.  This is because it has very
malloc-intensive data structures and so internally uses a special
version of malloc which cannot free (a simple incrementing allocator
working from blocks of ordinarily malloc'd memory).  The amount of
memory used is proportional to the number of files processed, and even
for a complete reinstall in one dpkg run it won't use more than about
twice the amount for a `dpkg --search' - and most of the data will be
swapped out.

On `small memory' systems dpkg switches to a different data structure
which is about twice as slow for general access on a big machine, but
has a much smaller working set so is much faster for setup and access
on small machines.  dpkg uses sysinfo(2) to guess which algorithm to
use, and you can force one or the other using command line options.  I
have checked this on a 3Mb system and it worked as expected.

> > Not only that, but it is hideously slow even on current computers.  My
> > suggestion: store the databases in a DBM format of some sort instead of
> > plain text. 

The reason dpkg is slow is _not_ mainly because of the database
format it uses.

It's mainly because the access method you're using is (I surmise)
reinvoking dpkg each time.  That involves loading the more robust data
structures in /var/lib/dpkg/info into a fast-to-access in-core format.

Unfortunately dpkg's current calling interface makes it hard not to do
this, but I'm going to fix that at some point.

I also intend to change the format of the /var/lib/dpkg/info/*.list
database to make it faster to load, and I may change
/var/lib/dpkg/status too.  (The resulting structures will still be
editable with emacs.)

> IMHO, dpkg should be using a DBM database for file -> package lookups
> and perhaps for the "status" and "available" caches too.  (I believe
> apt does something like this for "available".)
> 
> (I presume that dpkg actually does use hash tables internally, but it
> recalculates that 12MB of data everytime it starts up, which, IMHO, is
> not very efficient.)

It's only inefficient if you start up dpkg a lot.

Using a dbm file or something is fine if you just want read-only
access.  However, they're no good for updating, because such systems
do not have sensible behaviour on filesystem failures like disk full -
they can't be updated atomically.  You end up having to read
everything in and rewrite the whole database after every update.

> The startup time and memory usage is just not worth any benefits
> gained from using a few thousand text files.  
> 
> And the text version is still prone to severe corruption.  Mine was
> scrambled the other day when I upgraded the modutils package running a
> 2.1.x kernel - the machine locked up, and when I rebooted and tried to
> install more packages, dpkg mixed up a bunch of scripts and .list
> files.

I think this was probably a simple kernel bug.  dpkg cannot defend
against your kernel scrambling its filesystem data structures.  It
does ask the kernel to confirm that changes have been committed to
disk before it continues.

Here is the relevant code from dpkg:

  file= fopen(newvb.buf,"w+");
  if (!file) ohshite(...);
  push_cleanup(cu_closefile,ehflag_bombout, 0,0, 1,(void*)file);
  while (list) {
    if (!(leaveout && (list->namenode->flags & fnnf_elide_other_lists))) {
      fputs(list->namenode->name,file);
      putc('\n',file);
    }
    list= list->next;
  }
  if (ferror(file)) ohshite(...);
  if (fflush(file)) ohshite(...);
  if (fsync(fileno(file))) ohshite(...);
  pop_cleanup(ehflag_normaltidy); /* file= fopen() */
  if (fclose(file)) ohshite(...);
  if (rename(newvb.buf,vb.buf)) ohshite(...);

ohshite is a nonreturning error handling function.  I've elided its
arguments for brevity.

As you can see, it is careful to flush and sync the .list file before
it uses rename(2) to atomically overwrite the destination file.

Ian.


--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: