[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg flex-based status file parser, for 35% speedup

Ian Jackson <ian@davenant.greenend.org.uk> writes:

> I have written over the weekend a replacement for lib/fields.c and
> most of lib/parse.c, which uses flex (and flex start conditions) to
> generate a table-driven scanner-cum-parser.  I haven't tested this
> fully for correctness yet, but I have done basic functionality tests
> and some performance tests.
> Upsides:
>  * It runs 35% faster than the existing hand-coded *p++-style parser.

35% speedup in 1% of the total time spend in dpkg? Is this the right
place to optimize? I would have thought optimizing the *.list files
would be more important. :)

>  * It's about 25% smaller in source code, and I think much clearer.

Thank you. That would be most welcome. On the other hand now the
multiarch patches have to be rewritten.

>  * It should also fix any top-bit-set character handling bugs which
>    remain in the current codebase.
>  * It will make it easier in the future to decouple the parsing
>    functionality from dpkg's internal data structures so that the
>    parser can be made available through a library interface.
> The downside is that it's 100K longer in code size.  This is a
> particular problem because copies of it are distributed amongst
> several tools.  There are three copies of it in dpkg.deb (in Ubuntu
> gutsy), for example.  It's tempting to say that we should merge dpkg
> and dpkg-query and dpkg-trigger back into a single executable.  This
> would probably save 150K or so in the current Ubuntu dpkg, and make
> the flex-based triggers-supporting dpkg package 50-70K smaller than
> the current sid dpkg.  So I ask: why does dpkg-query exist as a
> separate executable ?

Before you do that maybe it would be better to finaly create a libdpkg
and put the common parts in there. A lot of dpkg functionality is also
duplicated in apt and friends which is a real shame.


Reply to: