Re: dpkg flex-based status file parser, for 35% speedup

To: Ian Jackson <ian@davenant.greenend.org.uk>
Cc: debian-dpkg@lists.debian.org
Subject: Re: dpkg flex-based status file parser, for 35% speedup
From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
Date: Thu, 30 Aug 2007 10:11:36 +0200
Message-id: <[🔎] 87myw9bfwn.fsf@informatik.uni-tuebingen.de>
In-reply-to: <[🔎] 18130.58341.210815.470424@davenant.relativity.greenend.org.uk> (Ian Jackson's message of "Mon, 27 Aug 2007 15:47:01 +0100")
References: <[🔎] 18130.58341.210815.470424@davenant.relativity.greenend.org.uk>

Ian Jackson <ian@davenant.greenend.org.uk> writes:

> I have written over the weekend a replacement for lib/fields.c and
> most of lib/parse.c, which uses flex (and flex start conditions) to
> generate a table-driven scanner-cum-parser.  I haven't tested this
> fully for correctness yet, but I have done basic functionality tests
> and some performance tests.
>
> Upsides:
>  * It runs 35% faster than the existing hand-coded *p++-style parser.

35% speedup in 1% of the total time spend in dpkg? Is this the right
place to optimize? I would have thought optimizing the *.list files
would be more important. :)

>  * It's about 25% smaller in source code, and I think much clearer.

Thank you. That would be most welcome. On the other hand now the
multiarch patches have to be rewritten.

>  * It should also fix any top-bit-set character handling bugs which
>    remain in the current codebase.
>  * It will make it easier in the future to decouple the parsing
>    functionality from dpkg's internal data structures so that the
>    parser can be made available through a library interface.
>
> The downside is that it's 100K longer in code size.  This is a
> particular problem because copies of it are distributed amongst
> several tools.  There are three copies of it in dpkg.deb (in Ubuntu
> gutsy), for example.  It's tempting to say that we should merge dpkg
> and dpkg-query and dpkg-trigger back into a single executable.  This
> would probably save 150K or so in the current Ubuntu dpkg, and make
> the flex-based triggers-supporting dpkg package 50-70K smaller than
> the current sid dpkg.  So I ask: why does dpkg-query exist as a
> separate executable ?

Before you do that maybe it would be better to finaly create a libdpkg
and put the common parts in there. A lot of dpkg functionality is also
duplicated in apt and friends which is a real shame.

MfG
        Goswin

Reply to:

Follow-Ups:
- Re: dpkg flex-based status file parser, for 35% speedup
  - From: Ian Jackson <ian@davenant.greenend.org.uk>
- Re: dpkg flex-based status file parser, for 35% speedup
  - From: Oleg Verych <for.gmane@flower.upol.cz>

References:
- dpkg flex-based status file parser, for 35% speedup
  - From: Ian Jackson <ian@davenant.greenend.org.uk>

Prev by Date: Re: start-stop-daemon can't stop process
Next by Date: Re: dpkg flex-based status file parser, for 35% speedup
Previous by thread: dpkg flex-based status file parser, for 35% speedup
Next by thread: Re: dpkg flex-based status file parser, for 35% speedup
Index(es):
- Date
- Thread