[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Packages file (long email) [WAS: Splitting Packages]

Hi all,
       I have made some analysis over the Packages file and tried to
split and optimize it to be better useable always keeping in my mind
as a primary target very old system (but this does not mean that it
doesn't help also newer ones).

Im pretty sure that some of the conclusion were already discussed but
I will raise them again. Maybe another discussion/proposal can bring
out new ideas. btw this is only from my idea ;)

In the first step lets take a look to the structure of the Packages file.
The structure of each package can be optimized. These are just "micro"
optimizations but repeated *numpkgs time can increase general performance.
I have noticed that there are somehow too many exception inside
the structure, look at 2 packages for example (bash and ax25-xtools):

Package: ax25-xtools
Priority: optional
Section: hamradio
Installed-Size: 176
Maintainer: Patrick Ouellette <pouelle@debian.org>
Architecture: i386
Source: ax25-tools
Version: 0.0.8-2
Depends: libc6 (>= 2.2.4-4), libfltk1, libgl1, libstdc++2.10-glibc2.2 (>= 1:2.95.4-0.010810), xlibs (>> 4.1.0)
Suggests: talkd, ax25-apps, ax25-tools
Conflicts: ax25-utils
Filename: pool/main/a/ax25-tools/ax25-xtools_0.0.8-2_i386.deb
Size: 38234
MD5sum: 91e9467284bd0a546bd4660426311f39
Description: AX-25 Tools (X versions)

Package: bash
Essential: yes
Priority: required
Section: base
Installed-Size: 1788
Maintainer: Matthias Klose <doko@debian.org>
Architecture: i386
Version: 2.05a-9
Replaces: bash-doc (<= 2.05-1), bash-completion
Depends: base-files (>= 2.1.12)
Pre-Depends: libc6 (>= 2.2.4-4), libncurses5 (>= 5.2.20020112a-1)
Conflicts: bash-completion
Filename: pool/main/b/bash/bash_2.05a-9_i386.deb
Size: 738080
MD5sum: 840298aac0c78a730fc2852deade7cad
Description: The GNU Bourne Again SHell


The first thing that poped up to my eyes was the entry "Essential: yes" in bash Since there's already a priority structure why not use that? Having a priority
essential makes more sense to me than having an exception that needs to be
analized and that anyway give exactly the same results.
The same concept can apply to many other things.
Source: ax25-tools for example.
Why there is a source field in an optional pkg and not in bash for instance??
Now that's just the general idea that pushed me to think to a sort of "new"
structure. Ex:

Priority: 1
Depends: 2

1) where priority should include "essential"

2) Instead of using the different keywords such as Conflict:, Pre-Depends:,
Replaces: etc, etc, I would rather suggest one line only that will include
  all of them and each pkg listed can be prepended with some info ex:
  + depends
  * conflict
  = replace
  ! pre-depends
  etc. etc. (just an example!)
  so to look like
  Depends: +base-files (>= 2.1.12), !libc6 (>= 2.2.4-4), *bash-completion
  and so on...
  In this way is possible to save some lines in the file and the parser can
  benefit from that.

My idea of having this small and static structure is to remove as much as
possible redundant info like Source, Arch, Maintainer and to optimize the file parser
removing exceptions.
This will:

1) reduce the general size of the file
2) reduce time to parse the file
3) even reduce a bit the flexibility (yeah I know... but why lie???)

For keywors like Section/Priority/Version I made some notes at the end of
the mail that might be interesting.


The next step in my idea is to split the Packages file in several files
according to Section and Priority.
I have figured out 2 possible scenarios.

First scenario (keywords: Section/Priority)
one entry in the sources.list will look like:

deb http://<mirror>/debian unstable main/base/* main/net/important contrib/*

where * means all the priorities that belongs to main/base/ contrib/

Second scenario (keywords: Priority/Section)
one entry in the sources.list will look like:

deb http://<mirror>/debian unstable main/essential/* contrib/optional/web/*

where * means all the sections that exits in that priority.

Decide which one is better over the other is very difficul but don't forget
that they can be really easily implemented in parallel becuase they can
The Packages file that reside in main/net/optional is exactly the same
that is in main/optional/net/ so a symlink is more than enough giving
people freedom to choose the way they prefer. apt can take care easily
to avoid duplicate downloads of Packages files.

Now Ben Armstrong pointed out 1 problem: What if I need only ONE package that
is not in a section mentioned in the sources.list???

Well I thought about 2 possible solutions that can coexist at the same time but hounestly I don't find them as "the best" (I will really appreciate idea here!!!). One could be an external file in /etc/apt where people can specify single pkgs
that apt should care of.
Two could be the possibility to specify single pkgs directly in the
sources.list extending the entry to (ex)

Now before yelling here since Im sure 100% that all of you will ask:
what about the version control/dendencies of that pkg??? ;)
please continue to read :)

The general approach of dividing the Packages file rise one issue:
What about dependencies???
Keeping the actual way of handling dependencies will break for one simple reason,
if a pkg depends on another package that is in one section not listed in the
sources.list than there's a problem.

The solution is to have in /main /contrib /non-free a file called Available
that file should contain one entry per line and every line should look like:


According to some stats I made over unstable/main it does not get bigger than
50KB (.gz) so I think it's an acceptable size for over 8300 pkgs.

This file can be used really for many things.

1) can keep version control for single pkgs (here we go, and since dependencies are declared also inside the .deb a small check will avoid to download the
  Packages file for that section/priority)
2) can be used to correct broken dependencies (read above)
3) fast search over the entire archive even if a specific section/priority is
  not present in the sources.list
4) diffed against the old one can be used to decide with Packages files should be downloaded according the sources.list (this can be an interesting option
  to analize in order to reduce network load probably even for mirroring
5) permits to remove Section / Priority / Version lines from the Packages files

Now in conclusion. Using this approach will:

1) increase flexibility to handle pkgs/archive etc.
2) reduce in general network load
3) increase performance in order to handle small systems (ALWAYS according to
  what the admin pretends from a small system!)
4) require a rewrite of code in many pkgs for it's implementation
  (but this is common also if other implementation will take place)
5) the transition to this system is smooth since it's a reimplementation of what
  is already in place

Well I hope I didn't waste my time. I'm sorry that I didn't reach to produce a
fake archive to show how it looks but all these conclusion/ideas come out
trying to build a script to generate it. The script is far to be nice, complete and fast so doesn't make sense for me to show it (IT'S SO HUGLYYYY!!! ;) probably
like my idea ehehhe)


To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: