[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg flex-based status file parser, for 35% speedup



Ian Jackson <ian@davenant.greenend.org.uk> writes:

> Goswin von Brederlow writes ("Re: dpkg flex-based status file parser, for 35% speedup"):
>> 35% speedup in 1% of the total time spend in dpkg? Is this the right
>> place to optimize? I would have thought optimizing the *.list files
>> would be more important. :)
>
> It makes `dpkg --status <package>', `dpkg -l ...', etc. 35% faster.
> It's true that it doesn't help much when you're doing a big
> installation run but speeding up general queries is very worthwhile I
> think.

Worthwile, no argument. Note the smiley.

>> Ian Jackson <ian@davenant.greenend.org.uk> writes:
>> >  * It's about 25% smaller in source code, and I think much clearer.
>> 
>> Thank you. That would be most welcome. On the other hand now the
>> multiarch patches have to be rewritten.
>
> I thought the most recent multiarch proposals didn't involve
> substantial changes to the status and control file format ?  I had
> various conversations with people at Debconf about multiarch and found
> it difficult to get hard information.  Everyone I spoke to had
> different ideas about what "the current plan" was and no-one was able
> to point me to any definitive design documentation.

Status and control file format don't change. Only change is that
"Multi-Arch: yes", "Multi-Arch: no" or no such entry is present for
each package.

But the internal database has to change to allow packages with the
same name but different arch to be present. The old parser reads in
the name, allocates (sort of) a spot in the package hash table for the
package and then fills in the remaining fields as it parses them. With
multiarch it has to parse name and architecture before it can do any
such thing.

This also affects dependencies creation in the parser. With multiarch
the depends can be on a single package or on one of several:

foo depends bar [Multi-Arch: yes or missing] -> same architecture or all
foo depends bar [Multi-Arch: no] -> any architecture

The old parser converts the textual form into the internal form with
pointers into the package database during parsing. At that point
multiarch doesn't yet know what the depends must look like.

Fixing those two issues was quite intrusive to the parser.


If you need a more detailed description of dependencies for multiarch
and resolving them I can make you some pictures and text.

>> > [...]   It's tempting to say that we should merge dpkg
>> > and dpkg-query and dpkg-trigger back into a single executable.
>> 
>> Before you do that maybe it would be better to finaly create a libdpkg
>> and put the common parts in there. A lot of dpkg functionality is also
>> duplicated in apt and friends which is a real shame.
>
> Making the common parts have a shape that's useful for apt et al is
> nontrivial, unfortunately.  Otherwise we would have had a useable
> libdpkg already.  This new parser is a step in that direction because
> it makes it easier to reuse the meat of the parser separately from the
> dpkg internal data structures (which have no stable ABI and are not
> reentrant, to give the two clearest reasons why they're not suitable
> for straightforward reuse).
>
> I asked Wichert why he invented dpkg-query and apparently it was part
> of an intended programme of modularisation.  I think that's all well
> and good but until we have effective code sharing arrangements we
> should bundle it up back into one executable.  So I will do that.
>
> Ian.

MfG
        Goswin
 



Reply to: