Re: wishlist script: “hegemon”: set up bare git mirror farm for all debian sources
And BTW, the question that prompted this Gentoo discussion is here:
https://lwn.net/Articles/759539/
and includes this snippet:
2. git seems to be more efficient for frequent syncing, while
rsync seems to be more efficient for infrequest syncing. I'd
guess the crossover is somewhere around a week or few, but I
don't have data to support that.
which of course leads to the obvious thought that the crossover in
operation in a particular local repo, is user dependent, or rather,
repo dependent - i.e. someone working in a particular repo(s) may
want to frequently git pull, whereas for the bulk of Debian's 50,000
packages, a weekly or monthly batch update would be fine - excepting
only security updates.
This "batch git delta" distribution/ bundling on a weekly/monthly etc
or whichever basis, should mean that the existing ftp/ distribution
infrastructure can still be made use of...
On Wed, Aug 15, 2018 at 05:01:54PM +1000, Zenaan Harkness wrote:
> Rough stats for “hegemon emerge” now available courtesy Gentoo, see
> the bottom of the following article, last section titled "Git versus
> rsync":
> https://lwn.net/Articles/759467/
> and here's the money quote:
>
> Matt Turner said that he has set aside a 1GB partition for the
> tree, which works fine for the roughly 600MB needed by rsync, but
> not for Git. A shallow clone of the Git repository is roughly the
> same (around 660MB), but each pull adds to that, so without some
> kind of "auto-trimming", Git will grow quickly, Freeman said
>
> There is also a security issue considered:
>
> The GitHub mirror compromise has clearly led to some thinking (and
> rethinking) within the project about its practices and how they
> might be improved. It is not clear that there are any real
> conclusions that have been reached, much less plans made, but
> considering the various parts of the problem is certainly to the
> good.
>
> which leads to the obvious thought regarding the hegemon utility -
> the marginal repo size increase for say a Debian source builder to
> also include Gentoo's git branches for those packages which overlap
> (and vice versa of course) should not be too significant, but more
> importantly, if one entity's server is compromised (be it Github,
> Debian or etc), then this ought be simpler to auto-detect when your
> source repo has multiple git-repo upstreams to test against.
>
> Sounds like we're talking a few TiB of disk space here, including
> source checkouts and build artifacts, so nothing that cannot fit in a
> 10TiB spinning rust bucket…
>
>
> “To hegemon,
>
> and beyond‼”
>
>
>
> (And really, who wants Gentoo to beat Debian to the hegemonic finish
> line?)
>
>
> On Sun, Jul 01, 2018 at 07:55:01PM +1000, Zenaan Harkness wrote:
> > git at alioth is coming along very nicely and was a great step
> > forward for Debian (thank you Ian).
> >
> > What would be nice is a script e.g. called say "hegemon" which could
> > be used to greate a Debian git repo farm locally.
> >
> > Why?
> >
> > Well, I hold that this would be preferable to adding sources to my
> > Debian mirror - these days, who wants tar balls?
> >
> > Seriously, what you want is a source repo, with tags for the various
> > releases of a package, so you can e.g. compare the current sid
> > release with stable or testing, view the Debian specific patches,
> > etc.
> >
> > This is what "sources" used to be used for.
> >
> > Nowadays sources is kinda moot.
> >
> > I note from here https://wiki.debian.org/Alioth/Git ^B
> > https://wiki.debian.org/PackagingWithGit it says "If upstream is
> > using git to manage their source, the debianization repository can
> > live in a branch off of the main upstream tree. Clearly this can work
> > only with some upstreams, but the big upside is that the relationship
> > between the debianization and the upstream code is VERY clear." which
> > is immediately followed with this "Since the Debian APT repositories
> > still use tarballs you still have to manage those with this setup,
> > but the pristine-tar exists for that purpose."
> >
> > There are possibly still advantages to tarballs, I don't know, since
> > I haven't used one in something like a decade or more. Consistency of
> > distribution of sources? Compliance with Debian packaging guidelines?
> >
> > Debian sources is currently a poor man's hegemon, providing little
> > more than a point in time snapshot (which admittedly 'matches'
> > (hopefully) the current respective binaries).
> >
> > The true hegemon (maniacal laughter echoes in the background) wants
> > unified git repos, for every package.
> >
> > I imagine that some "canonical vcs" datum would need to be collected/
> > stored for each package, and hegemon would cycle through them running
> > some default cmd like “git fetch”
> >
> > hegemon would also know to use git plugins to hungrily grab upstreams
> > of other persuasions such as hg, bzr or (ughh! svn), for that
> > equanimous local uniformity and hegemonic experience that we all
> > crave so.
> >
> > Then, who would ever download 55GiBsMeDats again? Especially when a
> > simple
> >
> > git clone /my/hegemon/p/package-source-repo /my/tmp/package-work-dir
> > cd /my/tmp/package-work-dir
> > git reset --hard v1.02
> > deb-buildpackage --pristine-tar
> >
> > will do that job?
> >
> > THEN, finally, a weekly
> >
> > hegemon build-tar git-deltas ^1week
> >
> > could be usefully generated and distributed (and downloaded and
> > applied to local hegemon repo farms, by those who would otherwise be
> > updating their debian source repos) and might ultimately supplant the
> > tar source distribution.
> >
> > …
Reply to: