dpkg flex-based status file parser, for 35% speedup
I have written over the weekend a replacement for lib/fields.c and
most of lib/parse.c, which uses flex (and flex start conditions) to
generate a table-driven scanner-cum-parser. I haven't tested this
fully for correctness yet, but I have done basic functionality tests
and some performance tests.
* It runs 35% faster than the existing hand-coded *p++-style parser.
* It's about 25% smaller in source code, and I think much clearer.
* It should also fix any top-bit-set character handling bugs which
remain in the current codebase.
* It will make it easier in the future to decouple the parsing
functionality from dpkg's internal data structures so that the
parser can be made available through a library interface.
The downside is that it's 100K longer in code size. This is a
particular problem because copies of it are distributed amongst
several tools. There are three copies of it in dpkg.deb (in Ubuntu
gutsy), for example. It's tempting to say that we should merge dpkg
and dpkg-query and dpkg-trigger back into a single executable. This
would probably save 150K or so in the current Ubuntu dpkg, and make
the flex-based triggers-supporting dpkg package 50-70K smaller than
the current sid dpkg. So I ask: why does dpkg-query exist as a
separate executable ?
Anyway, I think we should deploy the flex-based scanner in sid (after
I've tested it a bit more) and then think at our leisure about how to
improve the shared code situation.
The branch with this code on can be found at
but note that that's a fork from my Ubuntu triggers branch so it
probably ought not to go into sid just yet, at least until the
triggers are stabilised in Ubuntu (which I think will be quite soon).
(People who were at the Debian UK BBQ may have heard me say that
my new parser was no faster. That was before I spent this afternoon
removing all of the backtracking from it.)
(FAOD this message is from my personal DD hat, not my Ubuntu hat)