[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#832326: very slow processing of Contents files



On Sun, Jul 24, 2016 at 01:35:13PM +0200, Eduard Bloch wrote:
> Hallo,
> * Julian Andres Klode [Sun, Jul 24 2016, 01:24:12PM]:
> > Control: tag -1 moreinfo
> > 
> > On Sun, Jul 24, 2016 at 12:43:23PM +0200, Eduard Bloch wrote:
> > > Package: apt
> > > Version: 1.3~pre2
> > > Severity: minor
> > > 
> > > Hello,
> > > 
> > > since Contents file handling has been added recently, the processing of
> > > them seems to be very slow. It takes about two minutes (guessed, not
> > > measured) where all other stuff is done within the first ~10 seconds.
> > > 
> > > <first analysis>
> > > I think, the basic problem here is the massive size of the data in the
> > > Index files - they are already big and compression ratio is very high.
> > > Uncompressed versions of both amd64 and i386 add up to about one
> > > gigabyte! OTOH when I zcat them both, it takes just about 5 seconds!
> > > So I guess the problem is the amount of data that needs to be rotated
> > > while patching the code.
> > > I measured a bit how ed performs and it takes about 11 seconds for
> > > Contents-amd64.gz (and about 166k of patch lines in a combined patch).
> > > Patch was made before from the series of related pdiff files, of course.
> > 
> > Not sure what is happening at your side, but APT should normally store
> > Contents files using LZ4 compression, not gzip; unless you force it to
> > do otherwise.
> 
> Hm? It's the first time I come in touch with LZ4 and have no idea what
> you mean.

We introduced LZ4 support in 6 months ago, on January 15. Maybe the dynamic
recompression code fails to recompress your gzip files when applying pdiffs
to it (it should read a .gz, and write out a patched .lz4; and later then
read .lz4 and write .lz4, see end of email).

What you can try is (1) make a backup of lists/, delete Contents.gz files
in there, and run update again -> you should now get Contents.lz4 files in
the lists dir.

> 
> > We specifically switched to LZ4 to solve this issue.
> > 
> > Does your system not use .lz4 compressed Contents files?
> > 
> > > APT::Compressor::lz4::Binary "false";
> > 
> > My system says:
> > 
> > APT::Compressor::lz4::Binary "lz4";
> 
> Shall I change it and report back in a couple of days?

That should not make a difference, as we use the library
anyway (which we depend on).

> 
> But anyhow, I am wondering... the obvious guess is taht the problem is
> the complexity (CPU time or memory) and not IO; how is extra compression
> supposed to fix it? IMHO it would rather make it worse.
> 

On initial download, we decompress the gzip file and recompress it with
lz4. This obviously is a bit slower than just writing the gzip compressed
file as it.

The speed up comes with pdiff: Before 1.2, we would read the .gz file,
apply any patches and write it to a gzip compressing output stream. The
output stream now uses LZ4 all the time (and input uses lz4 too once the
lz4 recompressed file exists). Compressing with LZ4 instead of gzip results
in a signifcant speedup (10x-100x or something, I'm not really that
sure).

-- 
Debian Developer - deb.li/jak | jak-linux.org - free software dev

When replying, only quote what is necessary, and write each reply
directly below the part(s) it pertains to (`inline'). Thank you.


Reply to: