[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Load on master.

On 4 Nov 1998, Guy Maor wrote:

> Jason Gunthorpe <jgg@gpu.srv.ualberta.ca> writes:
> > We are starting to reach the limit of the archive maintiance scripts
> Yes, they were never written for efficiency.  I'll write a caching
> mechanism for the md5sum's, dpkg-deb -f, etc this week.  Much of the
> contents run time is the final sort I believe, not the disk I/O to
> gather the data.


The contents mechanims could probably be sped up several times simply by
storing the tree in a multi-level btree. It would be implicity sorted in
parallel with the disk, would use less ram and have a generally lower
execution time. [ Think a directory tree with each dir being a btree of
files/dirs ] Coupled with caching of the deb contents list we could
probably bring this runtime undercontrol.

Last I checked that one script took so long because
  1) It uses 40m of ram
  2) It reads and decompresses every byte in the archive
  3) It used some perl construct that doesn't strike me as amazingly
     well suited for this

Guy, I'm very busy of late but I would like to help bring these run times
down, I might be able to write some code after the next APT release?

Perhaps could you extract the md5, control file and contents list of each
package once and stick it someplace then have the other scripts simply use
that information wherever possible?


Reply to: