[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Translating apt lists directly



On Sat, Dec 08, 2007 at 10:23:31AM -0500, "Lord of, St. Luke Valor" <lord.of.valor@gmail.com> was heard to say:
> I do not know if the apt database is held in a map, a list or a vector. If
> there is a way your program can place the contents of the APT database into
> a map, the solution to your problem is near.

  I'll try to explain my needs in simple English.

  The apt database (i.e., binary cache) is not what I need, I need the
package lists and the dpkg status file.  These are currently text files
formatted in RFC822-style; that is,

Field1: value1
Field2: value2
Field3: value3
 with continuation

  However, this format is not described in any programming documentation
for dpkg and apt.  In the case of dpkg, it's the exact opposite:
/var/lib/dpkg/status is *explicitly* an internal file that external
programs should not manipulate.

> Let me describe the algorithm in simple English. First, we'll create a map,
> which is represented as an associative container in C++. Containers are made
> possible with an inclusion of the <map> library. We will assume that the map
> contains a separate package for each element. The first cycle--or,
> iteration--through the database is output to the screen so that the user may
> verify which packages to keep. Perhaps you have compiled the kernel. Perhaps
> you have tinkered with the kernel. Perhaps you have taken apart the
> transmission of a motor vehicle. This painstaking process takes place at the
> user end after compilation of the program.

  Actually, what my code already did by the time I sent this message
was this:

  (1) build the set of packages visited by the dependency resolver.
      (this is why I want to truncate the cache; it lets me generate
      test inputs for the resolver that are less than 10MB)
  (2) concatenate and copy the Packages files from /var/lib/apt to a
      single output file for testing purposes.

  Since sending the mail, I've decided that probably no-one is going to
write a proper interface to dpkg just for me (this has been missing for
over 8 years already), so I just decided to bite the bullet and
manipulate the apt/dpkg files directly.  This is not a core feature of
the program anyway, and it's ok if it breaks when the dpkg guys change
the format of status (if they ever do).

  So now the procedure is:

  (1) build the set of packages visited by the dependency resolver
  (2) for each file in /var/lib/dpkg/status /var/lib/apt/lists/* /etc/apt:
      (2.a) if the file is not an RFC822-style list of records relating
            to packages (knowledge of file formats is hardcoded), then
	    just copy it literally to a file under the destination
	    directory; e.g., /path/to/dest/var/lib/dpkg/status.
      (2.b) otherwise, run a lightweight parser that drops sections
            whose Package: field names a package not in the set built by
	    step (1) and rewrites dependency lines to throw out packages
	    not contained in that set.
  (3) done!

  Next time I get a few minutes to hack on this, I'll probably add some
code to roll the whole thing up into a single archive so it can be
easily submitted to me by users.

  The problem here, which I described in my previous mail, is that this
breaks abstraction layers, badly.  The dpkg status file is an
undocumented implementation detail of dpkg that just happens to have
remained the same for the last 10 years or so.  However, just because I
don't know of a proper way of transforming the status file and copying
it to another location doesn't mean that one doesn't exist.  The same
goes for the apt package lists, although I think those have a more
stable format.

  But all in all, I'm happy with having a fairly simple implementation
that might need to be rewritten once in a while.

> The <dpkg> library is a dummy library that will represent the proper include
> file. An object of the package class is used as a predicate for the
> associative container.

  Good thing it's a dummy library, 'cos as far as I know there is no
such thing (libdpkg is an internal library whose interface is not meant
to be used by "outsiders").

  Daniel

  PS: the format of the apt cache is not a map, a list, or a vector; it
      is a large binary file on disk that is mmap(2)'d directly into
      memory.  You can find the definitions of the structures involved
      in /usr/include/apt-pkg/pkgcache.h, although most code uses the
      convenience "iterators" in /usr/include/apt-pkg/cacheiterators.h
      because dealing with the database directly is annoying.  Note
      that these are not STL-style iterators; read their documentation
      before you try to use them.



Reply to: