[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1067440: Compression makes searching packages very slow



On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote:
> Package: apt
> Version: 2.7.12
> 
> I noticed that searching for packages is very slow if the package lists are compressed. To reproduce, remove `/var/lib/apt/lists`, enable
> 
>     Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";
> 
> , run `apt update`. This enables LZ4 compression on my systems, but I don't think the exact method matters. You can then run `apt search librust`, which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has compression already set up), compared to 0.4 seconds without compression.
> 
> Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't matter too much.
> 
> I tried to look into it, and `strace -e trace=openat apt-cache search librust` shows it reopen and re-read one of the package lists:
> 
> openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
> librust-addr2line+default-dev - Cross-platform symbolication library - feature "default"
> openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
> librust-addr2line+object-dev - Cross-platform symbolication library - feature "object"
> openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
> librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - feature "rustc-demangle"
> openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
> librust-addr2line+std-dev - Cross-platform symbolication library - feature "std"
> 
> (you can use -e trace=openat,read to confirm that it's actually reading the file)
> 
> I believe it's quadratic in the number of search results, and this is related to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress the file in order to seek to the destination.
> 
> Unfortunately, I suspect this isn't exactly an easy fix, given the current design.
> 

Going to respond to this but also including responses to your followup email
which has a broken Subject:


Searching works by ordering the packages based on file, offset
and then iterating over them and looking them up. Seeking forward
to a higher offset does not involve a reopen, we just skip content
in betwene.

Full-text search is inside the description in the section parsed
for each package.

It's not clear why this fails on bookworm - I can reproduce that -
t certainly is fine in git main on my Ubuntu 24.04 system.


-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en


Reply to: