[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#970624: marked as done (Apt software repository metainfo as a git repo)



Your message dated Sun, 20 Sep 2020 15:05:12 +0200
with message-id <20200920130512.xg2nq5tdw7pkj3cm@crossbow>
and subject line Re: Bug#970624: Apt software repository metainfo as a git repo
has caused the Debian Bug report #970624,
regarding Apt software repository metainfo as a git repo
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
970624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970624
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: apt
Version: 2.1.10

With systems of Continuous Integration becoming widely popular as ever, the `apt-get update` operation is gaining more and more executions. When ran frequently, most of the time, the response would indicate that nothing new is available. And sometimes, when something did update, it's often only a subset of packages from a bigger package list are updated.

The current architecture, as far as I understand it, however, is not suited for minimizing the response time and traffic minimization for these scenarios.

What I'd like to ask is, were there considerations to use a more suitable architecture, like using a git repo as an underlying structure to optimise retrieval of updated repository state? Not only is it designed for exactly these use-cases, but it will also add features like being able to see the history of repository's metainfo changes using familiar git tools. In my opinion, it might have a great positive impact.

Thank you!

--- End Message ---
--- Begin Message ---
Hi,

On Sun, Sep 20, 2020 at 11:01:34AM +0300, Konstantine Rybnikov wrote:
> The current architecture, as far as I understand it, however, is not suited
> for minimizing the response time and traffic minimization for these
> scenarios.

If you have the latest version already downloaded and update again apt
will make a single GET request which the HTTP server can reply to with
"302 Not Modified", but even if it does reply with "200 OK" apt will
figure out that repository didn't change [and that it does not need
additional files]. The update process itself will take some time still
though as we will check if e.g. the signatures are still valid and cleanly
rebuild caches (sort of "git gc" but all the time) – so that situation
in terms of traffic can not be improved much as you can't really get
below one request without trickery, while the "response" time is
negatively effected by all the things git does not do but apt is
expected to.


> What I'd like to ask is, were there considerations to use a more suitable
> architecture, like using a git repo as an underlying structure to optimise
> retrieval of updated repository state?

Not every repository provides them, but apt supports pdiff patches for
the indexes, so it can update old indexes in much the same way git does.
The "history" is potentially endless, Debian does roughly ~14 days
though as at some point its more efficient to download the new file
instead of patching the old one (runtime as well as traffic-wise).


| Not only is it designed for exactly
| these use-cases, but it will also add features like being able to see the
| history of repository's metainfo changes using familiar git tools. In

We do not keep history¹ as that is pretty much a waste of space
– especially if you have it "endless" as git would do. The repository
also contains lots of data the user will not download (information for
other architectures for example) and the indexes are compressed (for
maximum traffic efficiency) in the repository while they will be stored
uncompressed (or compressed optimized for usage) on the client. git
can't really do that. The sizes of the files apt works with also tend to
be above the filesize git is comfortable with requiring the use of
git-lfs/-annex which mostly defeats the history and patch part and so
I don't see how "git was made for this" as there is no collaboration to
be done: Everyone follows the same branch (so to speak) without ever
diverting or merging anything.

Or, if you want: apt performs a shallow filtered clone while using
different storage methods for client and server AND is able to
update that later on – good luck trying that with git.


As such, this is not a bugreport, but a misunderstanding in what apt
already does/supports which is relatively similar to git without the
massive negative effects a naive switch would have.

I am therefore closing this report as it seems unactionable.
If you have specific suggests feel free the report them.


Best regards

David Kalnischkies

¹ history exists in the form of snapshot.debian.org for example. git
isn't used there either for much the same and a few more reasons though.

Attachment: signature.asc
Description: PGP signature


--- End Message ---

Reply to: