[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg upgrade/downgrade dependency problem



On Mon, 27 Apr 1998, Tom Lees wrote:

> Better, we should recode the dpkg "backend" to REALLY not care about dependencies
> or anything of the like. Make JUST a program which will unpack, configure, remove,
> etc. This can supply RPC, etc.

Well, it should check the critical ones, just as dpkg does now. Dpkg is a
almost a perfect backend, but it needs a better interface big time.

> I'll code using Berkeley libdb - SQL is too big, dbm is too old and bad.

I don't think libdb will cut it. I am thinking a scheme like what APT uses
for it's caches (mmap'd data structure) is the way to go. A hash database
is not very nice because it requires storing the whole path and is only
single direction. 

Two lookup operations are needed, 
  -  Give me a bit of info about a file
  -  Give me a list of all files owned by a package

I think I could compact the whole mess into under a meg and provide
extremely fast lookup and extremely fast insert/remove. 

A disk based system imposes a high a penalty on users with lots of memory
and if it does compact into under a meg then users will little ram will
not feel it so much, also users with little ram would tend to have smaller
numbers of files installed, so it balances out.

Incidently, the caching system for the package files compacts 3 meg worth
of package files into a 500k cache file :>

Basically, I would compose it out of two elements, a string table and a
directroy tree, nodes like

struct Directory
{
   Directory *Parent;
   Directory *Next;
   Directory *ChildDir;
   File *ChildFile;
   Package *Owner;
   Package *NextOwner;
   string *Name;
   unsigned long Hash;
};

struct File
{
   Directory *Parent;
   File *Next;
   Package *Owner;
   File *NextPackage;
   string *Name;
};

I can be more clever with optimizing space, this is just a general
layout.

My system has 13000 *.list entries and 1100 directories and about 700k of
list files.

That's 12000*4*5 + 1100*8*4 + 6*13000 = 353200

I'm estimating it will take 6 bytes on average for each pathname
component. That's 50% savings from the straight text files.

It averages about 26 bytes per file so lets quickly estimate a system with
50000 packages (master has 34, 50 is the upper bound I think) We are
looking at about 1.3meg max size.

I don't think you can compete with that using a generic dbm.. Insertion
times are lookup + constant, removal is lookup + constant and lookup's
worst time is likely to be around 50 hash compares (due to the
directory/file separation).

Furthermore, IO will tend to stay within the tree structure and not
venture into the string table much because of the stored hash values. 

Size could be reduced even more by placing a 64K item limit on the size of
the list files - this would half the required size by converting the
'pointers' to shorts.

This scheme could be used as a non-disk storage, purely in memory, the
requirement is that your use memory pools for all allocation so as not to
incurre the dreaded malloc overhead. I have a feeling it is a better idea
to mmap it in though.

Anyhow, this whole discussion is moot, such a major change to dpkg is not
trivial and isn't going to be done by us :>

Jason


--
To UNSUBSCRIBE, email to deity-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: