[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#703366: RFH: apt-file -- search for files within Debian packages (command-line interface)



On Thursday 21 March 2013, Niels Thykier wrote:
> On 2013-03-20 18:30, Stefan Fritsch wrote:
> > That would be the perfect solution. Unfortunately, it would also
> > mean that apt's pdiff implementation would need to be rewritten
> > because it is so inefficient. [...]
> 
> I spoke with David Kalnischkies (DonKult) and he told me that (part
> of) the reason why it is slow is that it makes no assumption about
> pdiffs. It is my understanding (of the code) that apt-file just
> blindly downloads all ("new") patches and applies them in one go.

I was under the impression that the Index file tells you exactly which 
patches are necessary. But due to the lack of any formal specification 
(at least at the time I wrote diffindex-* in apt-file), maybe I was 
wrong.

> Allegedly, rerepro can merge pdiffs so not all of them needs to be
> applied and (understandably) the APT maintainers do not want that
> to break.

This seems very broken to me. Merging the diffs on the server side has 
little benefit. You still need exactly the same number of diffs on the 
server but each diff gets larger and there is more change among the 
diffs so that the efficiency of caching proxies goes down. With keep-
alive connections and pipelining, downloading a few dozen files is not 
that big a problem.

And there are some implementations (at least apt-file's and the 
security tracker's) that depend on the pdiffs being incremental in 
order to be faster than apt by at least one order of magnitude. So if 
the archive would ever use the diff merging, those implementations 
would break.

> The solution is probably to extend the pdiff format
> (e.g. like the suggestion in [1]), so the client side can see
> exactly which patches are needed (instead of having to do them one
> at a time).
>   To this end, I have been making a bit of noise in #d-ftp;
> hopefully I will have news here soon.

I think apt should still be changed to assume incremental diffs unless 
the Index file is of a new format. That would bring the benefit even 
for old-style archives. Merging diffs on the server does not give 
comparable benefit.

> David reminded me that the APT side of things already had a GSoC
> last year[2].  The code has not been merged yet but at least a
> proof-of-concept branch is there.  Assuming that can be used, we
> are probably very close to making apt-file's update/purge commands
> obsolete.

Nice. But the pdiff problem still needs to be solved. You don't want 
to slow down apt-file update by a factor of 10 or more.

> As understood Nick, he was not interested in maintaining
> the current Perl variant of apt-file, but he would be interested
> in rewriting (and maintain said rewrite of) apt-file.  He was
> certain he could improve the search speed of apt-file while doing
> so.  Given the results of his apt-show-versions rewrite I am
> looking forward to that rewrite with great anticipation.  :)
> 
> What I propose we do is that I take over the maintenance of the
> current apt-file.  I will focus on making apt-file update/purge
> obsolete.

Sure. It's in collab-maint. Just commit away. But don't remove Thijs 
or Enrico, they still want to stay co-maintainers.

Cheers,
Stefan


Reply to: