[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: wishlist script: “hegemon”: set up bare git mirror farm for all debian sources



And BTW, the question that prompted this Gentoo discussion is here:
https://lwn.net/Articles/759539/

and includes this snippet:

  2. git seems to be more efficient for frequent syncing, while
     rsync seems to be more efficient for infrequest syncing.  I'd
     guess the crossover is somewhere around a week or few, but I
     don't have data to support that.

which of course leads to the obvious thought that the crossover in
operation in a particular local repo, is user dependent, or rather,
repo dependent - i.e. someone working in a particular repo(s) may
want to frequently git pull, whereas for the bulk of Debian's 50,000
packages, a weekly or monthly batch update would be fine - excepting
only security updates.

This "batch git delta" distribution/ bundling on a weekly/monthly etc
or whichever basis, should mean that the existing ftp/ distribution
infrastructure can still be made use of...



On Wed, Aug 15, 2018 at 05:01:54PM +1000, Zenaan Harkness wrote:
> Rough stats for “hegemon emerge” now available courtesy Gentoo, see
> the bottom of the following article, last section titled "Git versus
> rsync":
> https://lwn.net/Articles/759467/
> and here's the money quote:
> 
>   Matt Turner said that he has set aside a 1GB partition for the
>   tree, which works fine for the roughly 600MB needed by rsync, but
>   not for Git. A shallow clone of the Git repository is roughly the
>   same (around 660MB), but each pull adds to that, so without some
>   kind of "auto-trimming", Git will grow quickly, Freeman said
> 
> There is also a security issue considered:
> 
>   The GitHub mirror compromise has clearly led to some thinking (and
>   rethinking) within the project about its practices and how they
>   might be improved. It is not clear that there are any real
>   conclusions that have been reached, much less plans made, but
>   considering the various parts of the problem is certainly to the
>   good.
> 
> which leads to the obvious thought regarding the hegemon utility -
> the marginal repo size increase for say a Debian source builder to
> also include Gentoo's git branches for those packages which overlap
> (and vice versa of course) should not be too significant, but more
> importantly, if one entity's server is compromised (be it Github,
> Debian or etc), then this ought be simpler to auto-detect when your
> source repo has multiple git-repo upstreams to test against.
> 
> Sounds like we're talking a few TiB of disk space here, including
> source checkouts and build artifacts, so nothing that cannot fit in a
> 10TiB spinning rust bucket…
> 
> 
>     “To hegemon,
> 
>           and beyond‼”
> 
> 
> 
> (And really, who wants Gentoo to beat Debian to the hegemonic finish
>  line?)
> 
> 
> On Sun, Jul 01, 2018 at 07:55:01PM +1000, Zenaan Harkness wrote:
> > git at alioth is coming along very nicely and was a great step
> > forward for Debian (thank you Ian).
> > 
> > What would be nice is a script e.g. called say "hegemon" which could
> > be used to greate a Debian git repo farm locally.
> > 
> > Why?
> > 
> > Well, I hold that this would be preferable to adding sources to my
> > Debian mirror - these days, who wants tar balls?
> > 
> > Seriously, what you want is a source repo, with tags for the various
> > releases of a package, so you can e.g. compare the current sid
> > release with stable or testing, view the Debian specific patches,
> > etc.
> > 
> > This is what "sources" used to be used for.
> > 
> > Nowadays sources is kinda moot.
> > 
> > I note from here https://wiki.debian.org/Alioth/Git ^B
> > https://wiki.debian.org/PackagingWithGit it says "If upstream is
> > using git to manage their source, the debianization repository can
> > live in a branch off of the main upstream tree. Clearly this can work
> > only with some upstreams, but the big upside is that the relationship
> > between the debianization and the upstream code is VERY clear." which
> > is immediately followed with this "Since the Debian APT repositories
> > still use tarballs you still have to manage those with this setup,
> > but the pristine-tar exists for that purpose."
> > 
> > There are possibly still advantages to tarballs, I don't know, since
> > I haven't used one in something like a decade or more. Consistency of
> > distribution of sources? Compliance with Debian packaging guidelines?
> > 
> > Debian sources is currently a poor man's hegemon, providing little
> > more than a point in time snapshot (which admittedly 'matches'
> > (hopefully) the current respective binaries).
> > 
> > The true hegemon (maniacal laughter echoes in the background) wants
> > unified git repos, for every package.
> > 
> > I imagine that some "canonical vcs" datum would need to be collected/
> > stored for each package, and hegemon would cycle through them running
> > some default cmd like “git fetch”
> > 
> > hegemon would also know to use git plugins to hungrily grab upstreams
> > of other persuasions such as hg, bzr or (ughh! svn), for that
> > equanimous local uniformity and hegemonic experience that we all
> > crave so.
> > 
> > Then, who would ever download 55GiBsMeDats again? Especially when a
> > simple
> > 
> >  git clone /my/hegemon/p/package-source-repo /my/tmp/package-work-dir
> >  cd /my/tmp/package-work-dir
> >  git reset --hard v1.02
> >  deb-buildpackage --pristine-tar
> > 
> > will do that job?
> > 
> > THEN, finally, a weekly
> > 
> >  hegemon build-tar git-deltas ^1week
> > 
> > could be usefully generated and distributed (and downloaded and
> > applied to local hegemon repo farms, by those who would otherwise be
> > updating their debian source repos) and might ultimately supplant the
> > tar source distribution.
> > 
> > …


Reply to: