[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#810898: apt: "apt-get update" (1.2) very slow with compressed indices and debtags



Control: reassign -1 apt

On Fri, Sep 08, 2017 at 02:46:10PM +0200, Enrico Rossi wrote:
> Hi,
> 
> Enrico Zini made this python3 code to test the problem outside debtags:
> 
> #!/usr/bin/python3
> import apt
> 
> def main():
>     cache = apt.Cache()
>     for pkg in cache:
>         cand = pkg.candidate
>         if not cand: continue
>         tags = cand.record.get("Tag", None)
>         if not tags: continue
>         print(pkg.name, tags)
> 
> if __name__ == "__main__":
>     main()
> 
> Running it without compressed index in apt:
> 
> # time python3 cattags >/dev/null 
> real	0m1.671s
> user	0m1.539s
> sys	0m0.132s
> 
> with compressed index:
> 
> # time python3 cattags >/dev/null 
> real	1m44.238s
> user	1m33.536s
> sys	0m10.698s
> 
> I'm going to re-assign the bug to python-apt, maybe they have more info
> on the issue.

So the problem really is that random access to lz4 compressed files
does not really exist, instead, when seeking backwards in it, we seek
to the beginning and then re-read until we reached our target position.
When seeking forward, we simply read and discard the bytes from the
current position to the target.

I'm not sure how much backward seeks are involved in that, but if
the slowdown is substantial, probably a lot. I fear that because the
tag file uses a buffer and reads further than needed in the compressed
stream that essentially *every* jump is a backward read, thus causing
a quadratic amount of bytes sum(k * average_section_size, k=1...n) to 
be decompressed - yes, you read that right, O(n^2).

That's of course inefficient. One way to work around this is to make
TagFile use FileFd's native buffering support somehow, instead of maintaining
its own buffer. Then FileFd could just seek forward in the buffer and
then (if needed) in the file  instead of discarding the buffer and
starting from the beginning again.

-- 
Debian Developer - deb.li/jak | jak-linux.org - free software dev
                  |  Ubuntu Core Developer |
When replying, only quote what is necessary, and write each reply
directly below the part(s) it pertains to ('inline').  Thank you.


Reply to: