Re: RFC: The Future of Solving Dependency Problems in APT (SAT, CUDF)

To: APT Development Team <deity@lists.debian.org>
Cc: Daniel Burrows <dburrows@debian.org>, Michael Tautschnig <mt@debian.org>, Pietro Abate <pietro.abate@pps.jussieu.fr>
Subject: Re: RFC: The Future of Solving Dependency Problems in APT (SAT, CUDF)
From: David Kalnischkies <kalnischkies+debian@gmail.com>
Date: Wed, 5 Jan 2011 12:21:44 +0100
Message-id: <[🔎] AANLkTimzXOvnUg16sBzzWo5z6cNjxD74uh-XTu9mmc+p@mail.gmail.com>
In-reply-to: <1293113995.2353.77.camel@jak-thinkpad>
References: <1293039140.16550.36.camel@jak-thinkpad> <AANLkTimjs6BJSN229eVXULb=xWM5Tu_psXu8Do6C__db@mail.gmail.com> <1293113995.2353.77.camel@jak-thinkpad>

On Thu, Dec 23, 2010 at 15:19, Julian Andres Klode <jak@debian.org> wrote:
> On Do, 2010-12-23 at 13:47 +0100, David Kalnischkies wrote:
>> On Wed, Dec 22, 2010 at 18:32, Julian Andres Klode <jak@debian.org> wrote:
>> Further more, i would like to have a Hold(VersionSet) to tell the resolver
>> e.g. about dpkg holds and maybe even a Keep(PackageSet) to tell the
>> resolver to never remove this set of packages (but upgrade is fine) which
>> could be handy for essentials and co.
> We do not really need hold, that's completely identical to marking the
> package for installation, although it might be useful just for the
> semantics.

Install request can be unversioned, but yes, holds could be mapped to
versioned install requests, but i can think of solvers who would like to
have this separation available - aptitude for example can propose solutions
in which (one) package is requested but not installed. It would be bad
if it would ignore holds this way (maybe it also wants to do this, but
give it a different score, who knows).

> Does Keep(PackageSet) work with multi-arch where an install
> pkg:all becomes architecture-dependent? I merged the changes into the
> header.

I think i don't understand what you mean. "My" APT implementation "just"
creates a bunch of pseudo-versions which are architecture-dependent
and depend themselves on pkg:all which caries mostly the installation
state of the package. If i want to keep such an all package, fine, i can
even say that i want to keep an pkg:any package - it just doesn't make
a lot of sense without marking the "real" pkg:all, too, as a remove will
effect both… This way a change from :all to :any and vise versa is just a
matter of a new version with different dependencies -- but this is a trick
to avoid modifying great heaps of the resolver. A resolver working on his
own datastructure (as e.g. a cudf based one) doesn't need to know about
these pseudo-versions: the solver needs to implement his own tricks to
cope with MultiArch…

I think of keep() mostly useful for situation i know beforehand that a
solution including the remove of these packages is unacceptable,
e.g. is the remove of the xserver on a desktop machine not the best
thing a package manager could do on dist-upgrade even if this means to
remove even more (but less important) packages - i imagine the API
breaking X12 upload breaking every possible x application as an
exaggerated example (maybe even more clear if you add 'unattended'
to the situation).

>>
>> In return i would expect a set of versions of packages to install and to
>> remove.
> That's what Solve() gives you.

Ah, my mistake, i overlooked the & and thought the sets are the request…
SolveAndSet() gave the impression that the result is a modified depCache.

>> Solver::Rules is a mystery for me. Most complains about the resolver in
>> APT evolve from the fact that it doesn't switch candidate versions if it
>> has to (the popular car v2 with wheels v3 and v2 example comes to mind).
>> How to transport the meaning of candidate after all?
>> For me, candidate is an (important) implementation detail of the APT resolver.
>> Another resolver should be able to choose freely if he wants to put on wheel
>> v3 or v2 and therefore doesn't need a concept of candidate version.
> I removed all options except for the candidate one (which I switched,
> it's IGNORE_CANDIDATE_STATUS now), and the one for ignoring conflicts;
> as the latter one is really useful for building Debian images - for
> example, debimg has it's own solver mainly in order to ignore conflicts.
>
>>
>> I would say that every version i tell the solver about should be a candidate
>> for him to choose from -- why should i tell him about versions i don't want
>> in the first place after all? So in the end pkgCache is maybe the wrong
>> choice  and we should give the solver a (big) VersionSet as problem space.
> No. VersionSet costs too much space from my experience. That's one
> reason why I use mostly sets of pointers in my sat-solver prototype
> (made multiple MB difference or something like that) - possibly because
> it's a binary tree and allocates empty iterator objects for unused
> nodes.

Yeah, i thought more of concept wise, not implementation wise.
And VersionSet implementation is by far not optimal yet - i wonder what
would happen if it would use something like c++0x std::unordered_set instead
of the ordered std::set for example. The sets were mostly created to pass
around packages and versions while parsing the commandline and these requests
are much smaller than a package universe can be and therefore not optimized.

>> The problem with these multiple candidates would be that while an experimental
>> package can be a candidate, if at all possible stable should be preferred
>> which leads us to think about how we exposure pkgPolicy to the resolver world.
>> Wasn't in an old thread yet-another-description language mentioned for
>> exactly this task?
> The current prototype prefers candidate versions of packages over
> non-candidate versions. That is a sane choice in my opinion. I chose to
> use pkgDepCache because it brings a cache and a GetCandidateVer(), but
> we don't really need it and can use:

What i really want to transport with this sentence (and a few more) was that
there is no general concept of a 'candidate'. A resolver can choose whatever
version he wants as a candidate for his own.
Fine if your resolver wants to use the same as the APT resolver
(at least as a hint) but after all many resolvers will not care for a
concept of a single candidate -- they may work in a "all versions i know
about are candidates" fashion and just want to know which version they
should choose if they have an option:

Thats why i thought about modifying the universe in APT and only pass
possibly a subset to the solver: If the user for example has a negative pin
for a version - why should it be exposed to the solver? There are other
possible hard constraints that APT could do directly for all solvers so
these can focus on the soft constraints there we have a choice…
(user doesn't want gnome packages installed (APT) vs. user prefers
KDE packages if at all possible (solver))

>> > It shall be the base for two new classes: SatSolver and CudfSolver. The
>> > first class implements a solver based on the boolean satisfiability
>> > problem, the second one an interface to an external solver with data
>> > exchange happening via the CUDF format[1]. We'd also be willing to merge
>> > aptitude's solver into this new infrastructure if useful.
>>
>> I would divide Solvers into Internal (i can pass pkgCache/VersionSet directly
>> into it) and External (i need to reformat pkgCache/VersionSet) and subclass
>> it for {D,C}UDF and maybe other formats of other resolvers.
>> After all, how a solver solves isn't really interesting for the implementation
>> of an interface to it - and hard to define or in what category belongs
>> APT/aptitude?
> All solvers need to reformat the problem. The CUDF solver transfers it
> to CUDF, the SAT solvers translate everything to CNF. Both takes more
> time than solving the problem itself (200 ms for the picosat solver). I
> don't think we need to differentiate between Internal and External
> solvers within the class hierarchy.

Not all. The APT resolver works directly on the pkgCache - as well as
aptitude does. The main point was more that there is no general SatSolver
as i doubt picosat works on the same datastructure as another solver -
in this case it maybe is CNF rather than a Sat classification through -, but
pkgCache, depCache and {C,D}UDF are datastructures, some are build
anyway (pkgCache), some not (CUDF), while APT starts up.

Regarding speed, we can cache CUDF transformation results too at both ends
of the process and after all:
a) if a cudf solver performs good it shouldn't be to hard to make it work
   directly on pkgCache instead/too
b) is speed not the biggest concern. I can wait a while for the calculation
   as long as the result is a good one…

>> Integrating the proof into apt-get directly should be relatively simple but i
>> would like to see the abstraction to be in first so that APT doesn't need to
>> be linked against picosat (and all the others which will come maybe).
>> Some sort of plugin system maybe…
>> The version system (and a bit of other stuff) in APT has some sort of
>> built-in plugin system, if we could make that more dynamic…
>> (yes, i am dreaming a bit now)
> Dynamically loading solvers is a bit more complicated and it's easier to
> link it in. We still have external solvers using CUDF for other cases.

Sure is linking in easier, but i doubt there exists a "one size fits all"
solver for all usecases. APT for example has proven again to work better in
a non-interactive fashion as required in a dist-upgrade while aptitude is the
king of interactive package management, but failing big time in converting
a lenny into a squeeze installation if we trust the drafted release notes.
They all currently are not perfect for sbuild usage…

While the user interface shouldn't change too much between different
usecases i am pretty sure we will need to provide different solvers for
different actions - at least install, (dist-)upgrade, build-dep and
autoremove… beside that different usergroups may have different
solvergroups satisfying there needs…

Thats also why i would like to keep the current APT solver as the fallback
as it performs okay for many years now and would like to have all the new
and shine resolvers as optional (but recommend) alternative.

Best regards

David Kalnischkies

Reply to:

Prev by Date: Bug#607135: GzipIndexes not working anymore with the default compression order
Next by Date: Re: RFC: The Future of Solving Dependency Problems in APT (SAT, CUDF)
Previous by thread: Bug#607135: GzipIndexes not working anymore with the default compression order
Next by thread: Re: RFC: The Future of Solving Dependency Problems in APT (SAT, CUDF)
Index(es):
- Date
- Thread