Your message dated Sun, 20 Sep 2020 15:05:12 +0200 with message-id <20200920130512.xg2nq5tdw7pkj3cm@crossbow> and subject line Re: Bug#970624: Apt software repository metainfo as a git repo has caused the Debian Bug report #970624, regarding Apt software repository metainfo as a git repo to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 970624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970624 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: submit@bugs.debian.org
- Subject: Apt software repository metainfo as a git repo
- From: Konstantine Rybnikov <k-bx@k-bx.com>
- Date: Sun, 20 Sep 2020 11:01:34 +0300
- Message-id: <[🔎] CAAbahfR7=r6FeQ3LktPjvpewDc1VgVRNASKaPa=p2psbb68YfQ@mail.gmail.com>
Package: aptVersion: 2.1.10With systems of Continuous Integration becoming widely popular as ever, the `apt-get update` operation is gaining more and more executions. When ran frequently, most of the time, the response would indicate that nothing new is available. And sometimes, when something did update, it's often only a subset of packages from a bigger package list are updated.The current architecture, as far as I understand it, however, is not suited for minimizing the response time and traffic minimization for these scenarios.What I'd like to ask is, were there considerations to use a more suitable architecture, like using a git repo as an underlying structure to optimise retrieval of updated repository state? Not only is it designed for exactly these use-cases, but it will also add features like being able to see the history of repository's metainfo changes using familiar git tools. In my opinion, it might have a great positive impact.Thank you!
--- End Message ---
--- Begin Message ---
- To: Konstantine Rybnikov <k-bx@k-bx.com>, 970624-done@bugs.debian.org
- Subject: Re: Bug#970624: Apt software repository metainfo as a git repo
- From: David Kalnischkies <david@kalnischkies.de>
- Date: Sun, 20 Sep 2020 15:05:12 +0200
- Message-id: <20200920130512.xg2nq5tdw7pkj3cm@crossbow>
- In-reply-to: <[🔎] CAAbahfR7=r6FeQ3LktPjvpewDc1VgVRNASKaPa=p2psbb68YfQ@mail.gmail.com>
- References: <[🔎] CAAbahfR7=r6FeQ3LktPjvpewDc1VgVRNASKaPa=p2psbb68YfQ@mail.gmail.com>
Hi, On Sun, Sep 20, 2020 at 11:01:34AM +0300, Konstantine Rybnikov wrote: > The current architecture, as far as I understand it, however, is not suited > for minimizing the response time and traffic minimization for these > scenarios. If you have the latest version already downloaded and update again apt will make a single GET request which the HTTP server can reply to with "302 Not Modified", but even if it does reply with "200 OK" apt will figure out that repository didn't change [and that it does not need additional files]. The update process itself will take some time still though as we will check if e.g. the signatures are still valid and cleanly rebuild caches (sort of "git gc" but all the time) – so that situation in terms of traffic can not be improved much as you can't really get below one request without trickery, while the "response" time is negatively effected by all the things git does not do but apt is expected to. > What I'd like to ask is, were there considerations to use a more suitable > architecture, like using a git repo as an underlying structure to optimise > retrieval of updated repository state? Not every repository provides them, but apt supports pdiff patches for the indexes, so it can update old indexes in much the same way git does. The "history" is potentially endless, Debian does roughly ~14 days though as at some point its more efficient to download the new file instead of patching the old one (runtime as well as traffic-wise). | Not only is it designed for exactly | these use-cases, but it will also add features like being able to see the | history of repository's metainfo changes using familiar git tools. In We do not keep history¹ as that is pretty much a waste of space – especially if you have it "endless" as git would do. The repository also contains lots of data the user will not download (information for other architectures for example) and the indexes are compressed (for maximum traffic efficiency) in the repository while they will be stored uncompressed (or compressed optimized for usage) on the client. git can't really do that. The sizes of the files apt works with also tend to be above the filesize git is comfortable with requiring the use of git-lfs/-annex which mostly defeats the history and patch part and so I don't see how "git was made for this" as there is no collaboration to be done: Everyone follows the same branch (so to speak) without ever diverting or merging anything. Or, if you want: apt performs a shallow filtered clone while using different storage methods for client and server AND is able to update that later on – good luck trying that with git. As such, this is not a bugreport, but a misunderstanding in what apt already does/supports which is relatively similar to git without the massive negative effects a naive switch would have. I am therefore closing this report as it seems unactionable. If you have specific suggests feel free the report them. Best regards David Kalnischkies ¹ history exists in the form of snapshot.debian.org for example. git isn't used there either for much the same and a few more reasons though.Attachment: signature.asc
Description: PGP signature
--- End Message ---