Bug#139838: dpkg: performance problem - package database hash size too small
Package: dpkg
Version: 1.9.20
Severity: normal
A profile of 'dpkg-query --status man-db' from CVS (1.9.20 behaves
similarly) shows this:
46.43 0.39 0.39 2 195.00 420.00 parsedb
27.38 0.62 0.23 91788 0.00 0.00 findpackage
4.76 0.66 0.04 72859 0.00 0.00 illegal_packagename
3.57 0.69 0.03 47854 0.00 0.00 convert_string
real 0m1.848s
user 0m1.770s
sys 0m0.060s
Investigation shows that the hash used to store available packages
(lib/database.c) has 128 bins, and was last tuned with a sample set of
141 packages in total. Since there are now over 9000 binary packages in
unstable, this could do with being tuned again.
Increasing the number of hash bins to 16384 shows a marked improvement:
46.27 0.31 0.31 2 155.00 335.00 parsedb
10.45 0.38 0.07 27891 0.00 0.00 f_filecharf
8.96 0.44 0.06 91788 0.00 0.00 findpackage
8.96 0.50 0.06 19503 0.00 0.01 f_dependency
real 0m1.538s
user 0m1.410s
sys 0m0.110s
A patch follows. We haven't tried to see what effect tuning the hash
function itself will have yet; it'll probably improve things a little
further, but not as much as reducing the enormous number of hash
collisions that take place at the moment.
--- lib/database.c.orig Mon Mar 25 14:29:00 2002
+++ lib/database.c Mon Mar 25 14:17:02 2002
@@ -26,7 +26,7 @@
#include <dpkg.h>
#include <dpkg-db.h>
-#define BINS (1 << 7)
+#define BINS (1 << 14)
/* This must always be a power of two. If you change it
* consider changing the per-character hashing factor (currently 5) too.
*/
Thanks,
--
Colin Watson [cjwatson@flatline.org.uk]
--
To UNSUBSCRIBE, email to debian-dpkg-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: