[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg and sqlite... ...?



Hello,
I'm using Debian since several years, I'm not a developer but I try to
contribute when possible. It's about one year I'm experincing the slowness
of dpkg on reading the database.

It's been a long time since this thread died (2007 Apr).
I think lately the number of packages and libraries installed in a desktop
have grown (I count >1700 packages). There're >200000 files installed. This
means that dpkg is going to open 1700 files and read 200000 files from those
files.

It takes a long time, and please don't ask me to profile it... I think
everybody knows it takes much time to (Reading database... 5% - 10% -
15%...). This can be questionable if you're doing your upgrades when you
don't use the computer, but for installing/removing _just one package_ it's a
lot boring.
Also, from the development perspective, reading list files together with the
generation of manpages db is one of the reasons why it takes forever to build
packages in a chroot.

> Quote from Ian Jackson:
> dpkg needs to be
> very reliable; its databases must not get corrupted even under
> situations of stress.

This is desiderable in all situations. But dpkg can still rely on the
filesystem, while the sqlite backend could be a cache on top of the
filesystem that gets generated when it's corrupted/missing (like APT).
I think everybody who ever tried an rpm-based distribution feared the
database corruption, but here's not the case. The backend is still the file 
system but speeded up using a cache.
Also there could be a possibility of adding a configuration option
to avoid using the cache. Therefore this feature can be disabled by default.

> dpkg is very close to the bottom of the
> application stack; making it depend on a big and complex library like
> a SQL engine is a bad idea.

From this perspective it might be a bad idea, then one could take in
consideration to create its own format for keeping a cache (like APT does).
Since it is at the bottom of the application stack it has one more reason
to use a cache; software have cache components to speed up operations that
takes a long time to compute. In this case we have the database reading that
is slow and that can be cached.

> But there are some significant performance problems elsewhere:
> * The status and available file parser is too slow.  I think this
>   needs some optimisation work.
> * We still need the `smallmem' in-memory model for the file list
>   data (removed by Adam Heath while he was ripping out my nicely
>   simple counting allocator and replacing it with use of glibc
>   obstacks).

You can agree with me that these performance problems are way less significant
than reading the database, not saying they're less important.

In hope that my intervent hasn't been threaten as offensive, I'd like to
thank you all for the work you do on maintaing dpkg.

Best regards,

-- 
http://www.debian.org - The Universal Operating System

Attachment: signature.asc
Description: Digital signature


Reply to: