[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Publishing raw generic{,cloud} images without tar, and without compression, plus versionning of point releases



On 5/25/20 5:43 PM, Ross Vandegrift wrote:
> On Mon, May 25, 2020 at 02:21:48AM +0200, Thomas Goirand wrote:
>> On 5/24/20 11:39 PM, Bastian Blank wrote:
>>> On Sun, May 24, 2020 at 11:26:40PM +0200, Thomas Goirand wrote:
>>>> So I was wondering if we could:
>>>> 1/ Make the resulting extracted disk smaller. That'd be done in FAI, and
>>>> I have no idea how that would be done. Thomas, can you help, at least
>>>> giving some pointers on how we could fix this?
>>>
>>> Fix what?
>>
>> The fact that the raw image is 2GB once extracted, when it could be
>> 1/4th of that.
> 
> I don't think it's obvious how to do better.  The only ways I know to
> make a raw image smaller than its fs are:
>   1) sparse files
>   2) compression
> 
> FAI is using #1, and you want to avoid #2.  Do you know another way?

Actually, I do. Shrink the FS once it's prepared, and leave as few space
as possible (maybe, 20 GB), then resize the partitions. That way, the
final HDD is as small as possible. That's what I was doing optionnally
with openstack-debian-images, but I just don't know how this translates
for FAI.

>>>> 2/ Published the raw disk directly without compression (together with
>>>> its compressed form), so one can just point to it with Glance for
>>>> downloading. BTW, I don't see the point of having a tarball around the
>>>> compressed form, raw.xz is really enough, and would be nicer because
>>>> then one can pipe the output of xz directly to the OpenStack client (I
>>>> haven't checked, but I think that's maybe possible).
>>>
>>> No. Nothing in the download chain supports sparse files, so unwrapped
>>> raw images are somewhat out of the question.
>>
>> I've done this for 3 Debian releases [2], I don't see why we would loose
>> the feature because of a "sparse files" thing which you somehow find
>> important. 
> 
> I think Bastian's point is that tar is required to enable downloading
> the sparse files, since http can't represent the holes.  Otherwise, you
> need to transfer the full size of the fs.

Right, except if we make the holes as small as possible, in which case
it's not a problem anymore.

> I checked one of the older OpenStack images you linked to.  It behaves
> just like the FAI raw images, as far as I can tell:
> ross@vanvanmojo:~/tmp$ curl -L -o disk.raw https://cdimage.debian.org/cdimage/openstack/archive/8.0.0/debian-8.0.0-openstack-amd64.raw
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 100   361  100   361    0     0    512      0 --:--:-- --:--:-- --:--:--   512
> 100 2048M  100 2048M    0     0  11.0M      0  0:03:06  0:03:06 --:--:-- 10.4M
> ross@vanvanmojo:~/tmp$ ls -lh disk.raw
> -rw-r--r-- 1 ross ross 2.0G May 25 07:57 disk.raw
> ross@vanvanmojo:~/tmp$ du -h disk.raw
> 2.1G    disk.raw
> 
> Did I miss something?

You didn't miss anything. I never used the --automatic-resize for the
published images. :)

However, the code works, and is there:
https://salsa.debian.org/openstack-team/debian/openstack-debian-images/-/blob/debian/ussuri/build-openstack-debian-image#L1947

>> So what you're talking about is just having a sparse *temporary* file,
>> before the upload to Glance. Do we care, when what I'm proposing is to
>> get rid about this extra step of downloading, before uploading to Glance?
> 
> Is avoiding the extra download step more important than reducing the
> image size? Your first mail raised both issues, and FWIW, I thought you
> were mostly concerned about the size.

I'm always bad at explaining, and I need more words than normal people,
sorry for this. Let me try again...

What's important is reducing the amount of time a user experience. If we
provide a raw image file only in the form of a tarball, what's going to
happen is that OpenStack users will have to:
1/ Download the image (locally?)
2/ Uncompress
3/ Upload to Glance

If that user doesn't have already a VM on the cloud, and is working
remotely with a poor upload bandwidth (which is typical with DSL links),
uploading to glance is going to take forever, and will be forced,
because no other way around it (ie: the archive must be uncompressed
before the uplaod).

If we provide the RAW image directly, then the user just do:

openstack image create --copy-from <URL> (or using the OpenStack
dashboard, but at the end it's the same...), and then it's the cloud
provider who's going to download the image from our servers, which is
typically done at 1 GBits/s or more.

>> Of course the point releases show what will be in the image. For
>> example, if a cloud user spawn a new instance using an image which is
>> from the latest point release, he knows a bunch of (non-security fixed)
>> packages wont need upgrades (for example, at least base-files, but often
>> many other as well, like for example tz-data).
> 
> As a cloud user, I never want to care about point releases.
> 
> There's usually a way to identify the latest image of a given release.
> For example, on AWS and GCP, the api can search for the latest debian 10
> image.  Many deployment tools integrate this functionality, so I can
> always deploy the latest debian 10 image.
> 
> I've never used OpenStack though, so I don't know if it has similar
> features. 

It kind of does have the feature if the provider sets the correct
properties on the images (like OS vendor and version, stored in the
properties of images, which are key/value store), but it's unfortunately
not a standard, and it may depend from one public cloud to another.

So I end up doing things like this:

$ openstack image list --format value -c Name | grep debian-10
debian-10.0.1-20190708-openstack-amd64.qcow2
debian-10.0.2-20190721-openstack-amd64.qcow2
debian-10.0.3-20190815-openstack-amd64.qcow2
debian-10.1.0-openstack-amd64.qcow2
debian-10.1.2-20190925-openstack-amd64.qcow2
debian-10.1.5-20191015-openstack-amd64.qcow2
debian-10.1.6-20191114-openstack-amd64.qcow2
debian-10.2.0-openstack-amd64.qcow2
debian-10.3.0-openstack-amd64.qcow2
debian-10.3.2-20200406-openstack-amd64.qcow2
debian-10.4.0-openstack-amd64.qcow2

As you can see with this real use case example on a private cloud, I do
download each point release, and often, intermediaries (when I'm aware
of a grave security fix).

I very much prefer to have point release over there than just dates.

Cheers,

Thomas Goirand (zigo)


Reply to: