[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFC: improve dpkg-scanpackages performance with cached md5sums



Problem:

Creating Packages.gz with dpkg-scanpackages takes lot's of time for large repositories.

The main reasons why it is so slow is that all checksums of all packages, even those that did not change from the previous run, are recalculated every time.

Solution:

I've extended dpkg-scanpackages to accept a "--md5cache" | "-5" command line option that enables caching and reusing of md5sums.

When not used one ends up with stock dpkg-scanpackages behavior where all checksum are recalculated every time. Else md5sums of scanned packages are cached on the first run and reused on successive runs.

With cached md5sums, the time to create Packages.gz for my private repository (~600MB) dropped from over 1 minute to about 7 seconds on a PII/400Mhz.

Would it make sense to include such a feature into official
dpkg-scanpackages?



Reply to: