[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#810898: apt: "apt-get update" (1.2) very slow with compressed indices and debtags



Control: reassign -1 python-apt
Control: clone -1 -2
Control: reassign -2 debtags

(forget my last email's part about tagfile buffer stuff)

On Fri, Sep 08, 2017 at 02:46:10PM +0200, Enrico Rossi wrote:
> Hi,
> 
> Enrico Zini made this python3 code to test the problem outside debtags:
> 
> #!/usr/bin/python3
> import apt
> 
> def main():
>     cache = apt.Cache()
>     for pkg in cache:

This iterates in alphabetical order, but not neccessarily in file order. Using

     sorted(cache, key=lambda p: p.candidate._cand.file_list[0][1] if p.candidate else 9999999999)

or:

     sorted(cache, key=lambda p: (p.candidate._cand.file_list[0][0].id, p.candidate._cand.file_list[0][1]) if p.candidate else (9999999999,9999999999))

(sorting first by file, then by offset in file)

instead of 'cache' speeds it up:

before:

180.80user 18.78system 3:20.36elapsed 99%CPU (0avgtext+0avgdata 240428maxresident)k
0inputs+0outputs (0major+51060minor)pagefaults 0swaps

after:
8.04user 0.51system 0:08.61elapsed 99%CPU (0avgtext+0avgdata 240376maxresident)k
0inputs+0outputs (0major+52042minor)pagefaults 0swaps

This essentially sorts the accesses by offsets in the cache which corresponds
to offsets in the files, and avoids costly (O(n)) backwards seeks.

You can hack that into debtags *now*, possibly use getattr() in case
the internal fields change in the future. And we could provide a special
attribute on apt_pkg.Version or something that allows them to be ordered by
file offset.

-- 
Debian Developer - deb.li/jak | jak-linux.org - free software dev
                  |  Ubuntu Core Developer |
When replying, only quote what is necessary, and write each reply
directly below the part(s) it pertains to ('inline').  Thank you.


Reply to: