RFC: improve dpkg-scanpackages performance with cached md5sums
Problem:
Creating Packages.gz with dpkg-scanpackages takes lot's of time for
large repositories.
The main reasons why it is so slow is that all checksums of all
packages, even those that did not change from the previous run, are
recalculated every time.
Solution:
I've extended dpkg-scanpackages to accept a "--md5cache" | "-5" command
line option that enables caching and reusing of md5sums.
When not used one ends up with stock dpkg-scanpackages behavior where
all checksum are recalculated every time. Else md5sums of scanned
packages are cached on the first run and reused on successive runs.
With cached md5sums, the time to create Packages.gz for my private
repository (~600MB) dropped from over 1 minute to about 7 seconds on a
PII/400Mhz.
Would it make sense to include such a feature into official
dpkg-scanpackages?
Reply to: