Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases

To: debian-cloud@lists.debian.org
Subject: Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
From: Thomas Goirand <zigo@debian.org>
Date: Mon, 25 May 2020 02:21:48 +0200
Message-id: <[🔎] c0c42777-5867-7b00-27dd-3b217f50378a@debian.org>
In-reply-to: <[🔎] 20200524213925.vjacbyeeaqh5kc6p@shell.thinkmo.de>
References: <[🔎] 16022b71-c5e8-e8ef-9b9e-076d71072b70@debian.org> <[🔎] 20200524213925.vjacbyeeaqh5kc6p@shell.thinkmo.de>

On 5/24/20 11:39 PM, Bastian Blank wrote:
> On Sun, May 24, 2020 at 11:26:40PM +0200, Thomas Goirand wrote:
>> The bigger the image is, the longer it will take to copy, which is an
>> operation that OpenStack can do before spawning an instance.
> 
> And you setup instance types with < 2GB disks?

I don't but could. Though your question is off topic regarding what I
wrote above.

> Otherwise OpenStack
> needs to copy it anyway (and convert it in the process, it was a mess
> the last time I looked into this).
> 
>> So I was wondering if we could:
>> 1/ Make the resulting extracted disk smaller. That'd be done in FAI, and
>> I have no idea how that would be done. Thomas, can you help, at least
>> giving some pointers on how we could fix this?
> 
> Fix what?

The fact that the raw image is 2GB once extracted, when it could be
1/4th of that.

>> 2/ Published the raw disk directly without compression (together with
>> its compressed form), so one can just point to it with Glance for
>> downloading. BTW, I don't see the point of having a tarball around the
>> compressed form, raw.xz is really enough, and would be nicer because
>> then one can pipe the output of xz directly to the OpenStack client (I
>> haven't checked, but I think that's maybe possible).
> 
> No. Nothing in the download chain supports sparse files, so unwrapped
> raw images are somewhat out of the question.

I've done this for 3 Debian releases [2], I don't see why we would loose
the feature because of a "sparse files" thing which you somehow find
important. Truth is: nobody cares storing the raw image as sparse on an
OpenStack cluster because:

- the users that would download raw OpenStack images would be mainly
those willing to store them with Ceph as a backend (where sparse files
don't exist anyways, unless I'm mistaking).

- Glance (the OpenStack VM image service), with files as backend,
doesn't store sparse files at all in /var/lib/images either.

So what you're talking about is just having a sparse *temporary* file,
before the upload to Glance. Do we care, when what I'm proposing is to
get rid about this extra step of downloading, before uploading to Glance?

Last: I haven't talked about removing the compressed .xz raw image, but
about publishing them *ALSO* in uncompressed form.

>> One thing though: if I understand well, artifacts are first stored on
>> Salsa, and currently, there's a size limit. What is the max size? If I'm
>> not mistaking, it's 1GB max, right? If that's the case, then maybe
>> that's a problem with the current 2GB decompressed disk.raw image.
> 
> It's 250MB.

Then how are the ppc64el images generated? (they are bigger than this)

>> Another thing which bothers me, is that in our current publication,
>> there's no way to tell what image is from which point release.
> 
> What is the significance of that?  We use stuff from security primarily,
> so the point release don't show what might be in the image.

Of course the point releases show what will be in the image. For
example, if a cloud user spawn a new instance using an image which is
from the latest point release, he knows a bunch of (non-security fixed)
packages wont need upgrades (for example, at least base-files, but often
many other as well, like for example tz-data).

Someone may also want to run the image matching a given point release,
together with snapshot.debian.org (for example, just to test upgrades,
and many other possible scenarios).

So yes, point release numbers do have significance. Images with a date
that first appears as random, and reveal itself only if carefully
matched to the point release dates aren't user friendly at all.

If I say: Bastian, can you please give me the image from Buster 10.2, it
will for sure take you a lot of time to find it out. However, look at
this archive, which has security updates since 8.6.3:

[2]

Can't we just have something like this?!? How hard is it to understand
how much more convenient this is? By the way, why are we keeping a
history of 233 daily Bullseye images? [1] Is this of any use to anyone?
The CD team builds images weekly, why do we need daily images published
at the cloud team? And keep them forever, when the CD team does not?

Cheers,

Thomas Goirand (zigo)

[1] http://cdimage.debian.org/cdimage/cloud/bullseye/daily/
[2] http://cdimage.debian.org/cdimage/openstack/archive/

Reply to:

Follow-Ups:
- Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
  - From: Ross Vandegrift <rvandegrift@debian.org>
- Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
  - From: Bastian Blank <waldi@debian.org>

References:
- Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
  - From: Thomas Goirand <zigo@debian.org>
- Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
  - From: Bastian Blank <waldi@debian.org>

Prev by Date: Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
Next by Date: Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
Previous by thread: Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
Next by thread: Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases
Index(es):
- Date
- Thread