[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#884914: apt: does not deal well with high request latency web servers



Source: apt
Severity: wishlist
X-Debbugs-Cc: marga@debian.org

At work our packages and package lists are served by a web service that
has relatively high time to first byte latency. Once it starts
transmitting the bytes are pushed in a fast way but fetching the bytes
in the first place takes a "long" time. While we are somewhat fine with
the latency on "apt update" a lot of packages, it imposes a very high
penalty on systems that need to fetch a lot of small packages (e.g.
build dependencies).

The only solution apt offers today is pipelining. While it allows to
have a fast start for congestion control purposes, pipelining always
requires to send the answers in order down the pipe. Unless you set a
depth that equals the amount of packages to fetch, it will only
replenish the queue one by one as packages are completed, requiring a
full RTT to insert a new item. Furthermore it does impact the server
negatively if you consider the first hop to be a load balancer that fans
out to other backends. It needs to cache all answers to return them in
order.

An easy workaround (few lines changed) would be to just spawn multiple
transports for a given host target, to make use of multiple connections.
In this case load balancing the requests onto them speeds up the
transaction essentially to line speed. There is still the drawback that
naive load balancing (essentially adding n queues for a host) happens at
the beginning of the transaction rather than through-out. This is not a
concern in our particular case, though, as the main issue is to enqueue
enough requests on the server side.

It has been raised that this would cause violated assumptions by mirror
operators though, in case they approximate per-client limits using
per-connection rate limiting (because bucketing is hard). I'd argue that
an optional configuration setting that is not enabled by default should
still be fair to offer.

Another solution to solve this problem would be to implement HTTP/2
support, which allows to answer the requests non-linearly. In this case
a single connection would very likely be enough, as the server can just
answer what's available and the pipeline will be replenished
asynchronously. In our case the load balancer would also offer HTTP/2
server-side[1]. However I'd argue that such an implementation should
then not be hand-rolled like the HTTP(S) transport and would require
depending on another library like nghttp2. So it would likely need to
live in its own apt transport.

Happy to hear your thoughts on how to solve this. And please keep up the
great work. :)

Kind regards and thanks
Philipp Kern

[1] Note that HTTP/2 makes encryption mandatory.

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: