Bug#384182: AlphaHash suboptimal and slightly buggy -- caseinsensitive key retrieval broken
Package: apt
Version: 0.5.28.6
Severity: normal
Tags: patch
in apt-pkg/tagfile.cc, there is the function AlphaHash to make a 8-bit
hash out of tags from tagfiles (such as the keys in packagefiles).
However, this hash function is not case-insensitive -- because of this,
case-insensitive querying doesn't work if the last 2 characters differ
in case from the original. tagfileparser.Section.get("VERSION") doesn't
give any result, for example, if only "Version" is present, and URL and
Url will only be gotten correctly if you have the correct case. But
because of nature of the hash function, only the last two character's
case matters, which can be very confusing.
Additionally, the hashfunction has quite a number of collissions, and
can be trivially improved to have much less collissions, improving
performance. My suggestion is:
- Res = (unsigned long)(*Text) ^ (Res << 2);
+ Res = ((unsigned long)(*Text) & 0xDF) ^ (Res << 1);
And update the comment (the current one is wrong anyway, the hash is
currently going over the last 4 (would become 8) characters, not the
first 3).
The "& 0xDF" fixes the only real bug -- the case-sensitivity.
Changing the 2 to a 1 results in significantly less collissions. Taking
all tags from a current Packages and Sources file:
old new Packages.gz
x x Build-Essential
x x Essential
x Size
x x Installed-Size
x Replaces
x Filename
x Python-Runtime
x Description
x Section
x Depends
x Pre-Depends
x Recommends
x Python-Version
x SHA256
x Version
x Conflicts
x Suggests
7 vs 2 collission groups.
old new Sources.gz tag
x x Python-Standards-Version
x x Python-Version
x x Standards-Version
x Version
x x Build-Conflicts-Indep
x x Build-Depends-Indep
x Origin
x X-Vcs-Bzr
3 vs 2 collission groups, with 'Version' becoming unique
--Jeroen
--
Jeroen van Wolffelaar
Jeroen@wolffelaar.nl (also for Jabber & MSN; ICQ: 33944357)
http://Jeroen.A-Eskwadraat.nl
Reply to: