[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#703366: RFH: apt-file -- search for files within Debian packages (command-line interface)



On Thu, Mar 21, 2013 at 6:53 PM, Stefan Fritsch <sf@sfritsch.de> wrote:
> On Thursday 21 March 2013, Niels Thykier wrote:
>> On 2013-03-20 18:30, Stefan Fritsch wrote:
>> Allegedly, rerepro can merge pdiffs so not all of them needs to be
>> applied and (understandably) the APT maintainers do not want that
>> to break.
>
> This seems very broken to me. Merging the diffs on the server side has
> little benefit. You still need exactly the same number of diffs on the
> server but each diff gets larger and there is more change among the
> diffs so that the efficiency of caching proxies goes down. With keep-
> alive connections and pipelining, downloading a few dozen files is not
> that big a problem.

We needed to disable pipelining recently as we failed to "force" broken
proxies and servers into supporting it properly. Think e.g. squid and amazon.
Maybe the big webbrowsers are able to get them to behave now that
they all start to use pipelining …

Still, assuming a prefect world, we download a lot of files which means
a lot of gz-overhead per file. There is also the theory that a package that was
touched is soon touched again (e.g. to fix a bug) meaning we have a lot of
"useless" data downloaded. Add slow systems and those behind a self-controlled
mirror (where you could merge).

So in a perfect world we would support both.


> And there are some implementations (at least apt-file's and the
> security tracker's) that depend on the pdiffs being incremental in
> order to be faster than apt by at least one order of magnitude. So if
> the archive would ever use the diff merging, those implementations
> would break.

I wonder if that is the reason for the announced pdiff change in dak to not
be merged to this day:
https://lists.debian.org/debian-devel-announce/2012/09/msg00012.html


>> The solution is probably to extend the pdiff format
>> (e.g. like the suggestion in [1]), so the client side can see
>> exactly which patches are needed (instead of having to do them one
>> at a time).
>>   To this end, I have been making a bit of noise in #d-ftp;
>> hopefully I will have news here soon.
>
> I think apt should still be changed to assume incremental diffs unless
> the Index file is of a new format. That would bring the benefit even
> for old-style archives. Merging diffs on the server does not give
> comparable benefit.

As said, depends. Anyway, APT is usually extremely conservative regarding
breaking workflows, even if only a few users use this flow, so I highly doubt
we would change to incremental by default.


>> David reminded me that the APT side of things already had a GSoC
>> last year[2].  The code has not been merged yet but at least a
>> proof-of-concept branch is there.  Assuming that can be used, we
>> are probably very close to making apt-file's update/purge commands
>> obsolete.

I had unfortunately less time than I hoped, but I will try to write a proper
follow-up on this soon. Until then some loose ends:

The GSoC bundles another big change regarding sources.list handling which
needs work before we can merge this (the new code is incompatible with the
 old). On top of this the acquire system is extended to deal with more
complex extensions on the file front, which is interesting but independent
as most files we download do not need a complicated handling (like fallbacks
 and conditionals – think: (In)Release(.gpg)) so we need code for "simple"
files anyway, therefore no problem to do this independently.

Rewriting debReleaseIndex::ComputeIndexTargets in apt-pkg/deb/debmetaindex.cc
to query files based on configs rather than hardcoded should be key here
(beside moving this code up in the class hierarchy then).
Something along the lines of Acquire::Files::<Type>::<Identifier>::<Data>
there <Type> is "Base", "Flat" and "Tree", to have different settings for
"Flat" and "Tree" style archives. <Identifier> being a random name like
"Packages", "Contents", … And finally <Data> to set "URI", "Description" …
(I wonder if we need Acquire::Files::http://example.org/:: … too)

URI should be build with placeholders like BaseURI, Architectures,
NativeArchitecture, Languages. Many of these should be available in the
other <Data> elements as well (think: Description for Translation-*).

While we have IndexTargets and OptionalIndexTargets the later aren't really
optional (but hardcoded-optional as we couldn't break ABI at that point),
fixing this now would be good [aka: needed].


So long,
Best regards

David Kalnischkies


Reply to: