Re: Building cloud images using Debian infrastructure
(fixing top-posting)
On Wed, Aug 29, 2018 at 11:07:24AM -0500, Paul Dejean wrote:
> On Wed, Aug 29, 2018, 10:48 AM Luca Filipozzi <lfilipoz@debian.org> wrote:
>
> > On Wed, Aug 29, 2018 at 10:28:27AM -0500, Paul Dejean wrote:
> > > I honestly don't get it. Why is casulana so necessary for building these
> > > images going forward. What kicked off this thread was me demonstrating
> > > that
> > > machine images could be built in gitlab on google cloud runners that have
> > > nested virt support.
> >
> > Primarily, Debian (as a community) has long-held the opinion that our
> > packages, our cd images, and (by extension) our cloud images should be
> > built on hardware that is owned and operated by Debian. VMs provided by
> > a third party (AWS, etc.) are only as secure as the third party
> > (either poor architecture or nefarious intent) or as secure as the
> > hypervisor (against fourth parties).
> >
> > This explains why all the build daemons are on Debian-controlled
> > hardware.
> >
> > casulana was purchased to address two needs: cd-image and cloud-image
> > building. The former requires significant resource; the latter not
> > nearly as much.
> >
> > Secondarily, as you will have seen by the salsa thread relating to use
> > of Google storage for git lfs, there are members of the community that
> > would like to see Debian choose options that (a) make use of open source
> > software and (b) make us less rather than more reliant on the good will
> > of entities such as Google and AWS.
> >
> > Like I said earlier in the thread: the ongoing to-and-fro regarding
> > using casulana for build and using FAI is not useful at this stage.
> > Regardless of my personal opinion, I view these as settled discussion
> > points based on what I saw at the 2017 Cloud Sprint and at the DC18
> > Cloud BoF.
> >
> > I'm very appreciative of Bastian's work on getting gitlab build jobs
> > prepared. gitlab doesn't use gridengine; we may not need to go that far,
> > but we may wish to introduce some kind of semaphor between gitlab jobs
> > and cd-image jobs to allow all of casulana to be used by the cd-image
> > scripts.
> >
> > Finally, while salsa is using Google storage for git lfs, the ability
> > for Google to tamper with the objects in git in an undetectable way is
> > very limited so I'm less concerned about that particular usage of a
> > third-party resource. I've mentioned that I would love to see several
> > third-party storage solutions to be employed, ideally in different legal
> > jurisdictions, for redundancy purposes.
> >
> > Colleagues, please elaborate if my explanation above is incorrect in any
> > way.
>
> Ok that's understandable. Question #1 who pays for this? A datacenter rack
> costs money. And whoever owns the data center has physical access. The
> actual computer hardware costs money not just on a one time basis either.
Debian receives donations, both in-kind and cash.
Debian relies on hosting providers to provide, typically at no cost to
Debian, rack space and network access.
Frequently, this is with univerisities rather than corporations.
> Where does "hardware" begin and end? Does debian need to own the rack
> rather than renting it? The screws you use to mount the server? The
> Ethernet cables?
This is hyperbolic line of inquiry that makes me inclined to not answer
further emails from you.
> There's a huge cost to maintaining this too. From my understanding there's
> no mesos cluster setup right now, no kubernettes, no working openstack api.
> Creating a private Debian cloud is a lot of work. Not creating a private
> Debian cloud and just having a bunch of ad hoc servers is probably even
> more work in the long run.
Most of Debian's infrastructure uses VMs (ganeti). casulana is an
exception.
> The idealogy is admirable but we need to define clearly what problem we're
> trying to solve.
> Is it avoiding vendor lock in? If so there might be ways
> to use google cloud and avoid vendor lockin.
Use multiple clouds simultaneously, avoiding vendor-specific features or
use a reasonable abstraction (fog).
> Is it trying to keep Google from having access to our private data? If
> so a good first step would be stripping access from any Google
> employees who might be Debian maintainers (which would be incredibly
> silly).
That's not silly. How can Debian claim we have 'control over official
Debian cloud images' if we don't control who can access the various
cloud account by which we publish the images.
An important discussion to be had is whether and how to extend Debian
SSO into the cloud so that when DAM elects to close an account (or when
someone elects to retire), we close _all_ Debian-related access.
I don't view this as silly. I view it as appropriate account lifecycle
management. I encourage DMs to become DDs if they intend to do packaging
work, whether actual packages or cd-image or cd-cloud.
> Is it trying to avoid corporate influence? Amazon is already contributing
> resources (i think might be remembering wrong) and there were plans for
> Google to join in soon as was mentioned in this thread.
And we are very thankful for the resources that these corporations
provide. That said, it is important to many in the Debian community to
maintain an appropriate distance from them.
> I'm not trying to knock idealogy, it's what makes Debian not Red Hat. All
> I'm saying is that we need to define what exactly the rules and goals are
> here so we know what there is to work with.
And that's what happened over several Sprints and several BoFs.
--
Luca Filipozzi
Reply to: