Re: dpkg flex-based status file parser, for 35% speedup
Ian Jackson <email@example.com> writes:
> I have written over the weekend a replacement for lib/fields.c and
> most of lib/parse.c, which uses flex (and flex start conditions) to
> generate a table-driven scanner-cum-parser. I haven't tested this
> fully for correctness yet, but I have done basic functionality tests
> and some performance tests.
> * It runs 35% faster than the existing hand-coded *p++-style parser.
35% speedup in 1% of the total time spend in dpkg? Is this the right
place to optimize? I would have thought optimizing the *.list files
would be more important. :)
> * It's about 25% smaller in source code, and I think much clearer.
Thank you. That would be most welcome. On the other hand now the
multiarch patches have to be rewritten.
> * It should also fix any top-bit-set character handling bugs which
> remain in the current codebase.
> * It will make it easier in the future to decouple the parsing
> functionality from dpkg's internal data structures so that the
> parser can be made available through a library interface.
> The downside is that it's 100K longer in code size. This is a
> particular problem because copies of it are distributed amongst
> several tools. There are three copies of it in dpkg.deb (in Ubuntu
> gutsy), for example. It's tempting to say that we should merge dpkg
> and dpkg-query and dpkg-trigger back into a single executable. This
> would probably save 150K or so in the current Ubuntu dpkg, and make
> the flex-based triggers-supporting dpkg package 50-70K smaller than
> the current sid dpkg. So I ask: why does dpkg-query exist as a
> separate executable ?
Before you do that maybe it would be better to finaly create a libdpkg
and put the common parts in there. A lot of dpkg functionality is also
duplicated in apt and friends which is a real shame.