Source: apt Severity: wishlist X-Debbugs-Cc: marga@debian.org At work our packages and package lists are served by a web service that has relatively high time to first byte latency. Once it starts transmitting the bytes are pushed in a fast way but fetching the bytes in the first place takes a "long" time. While we are somewhat fine with the latency on "apt update" a lot of packages, it imposes a very high penalty on systems that need to fetch a lot of small packages (e.g. build dependencies). The only solution apt offers today is pipelining. While it allows to have a fast start for congestion control purposes, pipelining always requires to send the answers in order down the pipe. Unless you set a depth that equals the amount of packages to fetch, it will only replenish the queue one by one as packages are completed, requiring a full RTT to insert a new item. Furthermore it does impact the server negatively if you consider the first hop to be a load balancer that fans out to other backends. It needs to cache all answers to return them in order. An easy workaround (few lines changed) would be to just spawn multiple transports for a given host target, to make use of multiple connections. In this case load balancing the requests onto them speeds up the transaction essentially to line speed. There is still the drawback that naive load balancing (essentially adding n queues for a host) happens at the beginning of the transaction rather than through-out. This is not a concern in our particular case, though, as the main issue is to enqueue enough requests on the server side. It has been raised that this would cause violated assumptions by mirror operators though, in case they approximate per-client limits using per-connection rate limiting (because bucketing is hard). I'd argue that an optional configuration setting that is not enabled by default should still be fair to offer. Another solution to solve this problem would be to implement HTTP/2 support, which allows to answer the requests non-linearly. In this case a single connection would very likely be enough, as the server can just answer what's available and the pipeline will be replenished asynchronously. In our case the load balancer would also offer HTTP/2 server-side[1]. However I'd argue that such an implementation should then not be hand-rolled like the HTTP(S) transport and would require depending on another library like nghttp2. So it would likely need to live in its own apt transport. Happy to hear your thoughts on how to solve this. And please keep up the great work. :) Kind regards and thanks Philipp Kern [1] Note that HTTP/2 makes encryption mandatory.
Attachment:
signature.asc
Description: OpenPGP digital signature