[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC interaction with external dependency solver: "APT" state



On Mi, 2010-06-02 at 15:12 +0200, Stefano Zacchiroli wrote:
> [ sorry for the delay ]
> 
> On Wed, May 19, 2010 at 07:57:08PM +0200, David Kalnischkies wrote:
> > 2010/5/19 Stefano Zacchiroli <zack@debian.org>:
> > > I see a solution that still preserves the "one pass" approach, not
> > > adding any need of and ping-pong between package manager and solver. The
> > > idea is that in the answer from the solver, you get package stanzas as
> > > follows:
> > >
> > >  package: foo
> > >  version: 1.2.3-4
> > >  id: adgf31452135hkashdfa
> > >  installed: yes
> > 
> > So let us assume the user has installed <awesome,1,Packages-1>.
> > In the next request, which id will <awesome,1,Packages-1> have
> > compared to <awesome,1,status-file>?
> > Are these versions merged or not?
> 
> Right, this is the question :) The underlying problem related to this is
> that AFAICT dpkg does not preserve any kind of information on the
> *origin* of packages that get installed. Is that correct? It seems to me
> that the proper solution of this would be to have an ID in the dpkg
> metadata database which can be cross-reference with APT lists. Not that
> I think we should/can fix that for the purposes being discussed here,
> but I dwell a bit more in the analysis just to be sure that I hit the
> nail.
dpkg simply copies debian.tar.gz/DEBIAN/control and adds a bit state
information if I am not mistaken.

> 
> I guess that for packages coming from APT, apt can in principle invoke
> dpkg with an extra cmdline argument specifying, for each package, its
> MD5 sum (or equivalent), telling dpkg to store it. That would solve 99%
> of the occurrences of this problem, I guess. Packages without an
> assigned checksum would be, at worst, as they are now, i.e. not possibly
> cross-referenceable to their external origin. Are you aware of any bug
> report on dpkg about that? (I've skimmed through the list w/o finding
> anything apparently relevant thus far.)
Not all packages need to have hash sums in the Packages files, only
Packages in a secure-apt repository that do not want to appear as
untrusted. Passing parameters to dpkg is a bit complicated; since dpkg
takes on command and a list of objects (package names, package files,
etc) to operate on.

Otherwise I would have proposed to merge installation ordering into dpkg
and have higher-level package managers only call dpkg once, instead of
multiple times like it is currently the case. But maybe it is possible
to extend dpkg to support per-object stuff? But this paragraph gets
slightly offtopic now.

> 
> 
> If this is the case, the only solution is that id are completely
> determined by the available package metadata (and in particular by those
> metadata that will land into /var/lib/dpkg/status); that is the only way
> we can later on recognize an installed package as coming from a given
> package list. A hash, as I understand is currently implemented by APT
> (no wonder :)), is a particular case of that.
As far as I can tell, it's part of every version in the cache as
pkgCache::Version::Hash (which is unsigned short, i.e. it should be
16-bit long).

> 
> > > The idea is that "id" is an optional property (defaulting to "") and
> > > that the triple <package, version, id> uniquely identify a package for
> > > APT; i.e.: it will be able to discriminate among multi-arch,
> > > locally-rebuilt packages, and packages coming from different APT lists.
> > 
> > Just to be sure, we still talk about all package managers in this
> > thread here - or is it really about APT alone? I tried to be relatively
> > generic until now… - you later say "friends", this could mean
> > rev-depends like aptitude and co. - but also smart/apt2/cupt/…
> > I at least hope we are not deadly enemies, but friends… ;)
> 
> Yes, absolutely, I'm still talking about all such friends, even though
> I'm bit worried too by the lack of reaction of others :-)
I don't have a working package manager yet, so I am giving preference to
those who already have a complete package manager. I only need to be
able to convert to something like the cache API specified at
http://people.debian.org/~jak/apt2-doc/apt-AptCache.html. Although this
API is just a 1k lines prototype written in C89 (Vala became boring).


> 
> > Also, i don't see why multi-arch is in the same list as locally-rebuilt
> > packages. The are completely different problems:
> 
> Yes, they are, but from the point of view of how uniquely identifying
> packages, the ID scheme I've been proposing addresses both scenarios. I
> didn't mean to imply anything more than that.
And architecture is provided anyway.

> 
> > While the Packages files have checksums of the deb files the status file
> > has not. As i said already APT tries to "fix" this by hashing installsize and
> > the list of dependencies, so a version in APT is <numberstring, hashvalue>
> > full info e.g. in #574956 and #574072 in which you can also see what
> > happens if two version merger disagree (in this case human brain vs. APT)
> > as well as this situations are not completely academic…
> 
> So, it seems to me that a sane solution would be to:
> 
> 1) accept that the world is evil, and use a hash-based solution as the
>    sole trustworthy package unique id (still in "my" sense though,
>    i.e. package ids will be triple <name, version, id>; then, in
>    addition, you'll have the side effect that id is "your" hash which
>    incidentally is enough to uniquely identify the package)

For me, it's a quadruple <name, version, architecture, id>. Where 'id'
is a hash of installed size + dependency information (that's how APT
does it).

> 
> 2) fix a way, which should be shared by solvers interested in using the
>    external solver API, to compute that hash out of the metadata
>    available in APT lists (which AFAICT are anyhow a common ground for
>    all concerned package managers)
> 
> Would you consider that acceptable and not too constraining wrt the
> actual implementation of the hashing?
Yes.


> 
> If yes, when receiving the output of the solver, you will just project
> it to a list of package hashes that the solver tells you must be
> installed in a satisfactory solution.  Out of that, I presume the solver
> will already have a lookup function from hashes to actual packages.
Well, not solely hash lookup, as there may be hash collisions as well. 

-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.



Reply to: