[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building GCE images with bootstrap-vz



On Mon, Feb 24, 2014 at 2:52 PM, Anders Ingemann <anders@ingemann.de> wrote:
> the fallocate call with FALLOC_FL_PUNCH_HOLE should do the trick.

That's actually exactly what zerofree uses: http://intgat.tigress.co.uk/rmy/uml/index.html

Not quite accurate, actually. Read the same page again, paying attention to the paragraph starting at "However". FALLOC_FL_PUNCH_HOLE does the equivalent of the "sparsify" command at the end of that page, or your call to a VMware-specific shrink tool in the current code, not the zerofree. I almost gave you that URL in my previous mail, but didn't. :) Unlike the sparsify tool I mention later in this email, the sparsify tool from that URL seems to require that the filesystem containing the image, meaning the build filesystem in our setup, be unmounted first; it's possible to make that work but kind of ugly. It also further constrains the choice of build filesystem.

> Also, does the code start out with a sparse disk image?

Nope. It's just plain raw format. But it's damn easy to create a new disk format: https://github.com/andsens/bootstrap-vz/tree/master/common/fs

Sparse disk images aren't a separate format. They're just using fewer blocks on disk for the same theoretical size. There's no reason for the code ever to create a non-sparse raw disk, except some edge cases that are very unlikely regardless of provider. If the build filesystem doesn't handle sparseness correctly, or if the image is transferred in a way that doesn't preserve sparseness, the underlying tools will automatically do the right thing.

GNU truncate is one way to do this, which we use in the GCE build-debian-cloud logic.

> [...] with the exception of disk space that is allocated then freed during the build.

The minimize_size plugin prevents some of that by binding folders from the host system in to key locations on the chroot (/tmp and apt-cache)

Nice optimization. Does it bind to the real host system locations or to build-specific directories? The latter option makes a lot more sense.
 
> This should happen in all image builds that make raw disk images from scratch, since it's generally useful and widely portable across build filesystems/OSes even where zerofree or FALLOC_FL_PUNCH_HOLE is unavailable.

OK, cool. I'll try and add it to the plugin, should just be another task and then a switch whether you want to use zerofree, sparsification or nothing. Do you have a ready-to-go command I could just plug in?

There's no extra option to add to the plugin. The point about making raw disk images in a sparse manner is not a properly separate option, but an improvement to how bootstrap-vz builds raw disk images all the time. FALLOC_FL_PUNCH_HOLE is a way to make your shrink feature support raw disk backing instead of just vmdk backing, again not a separate option.

Here's a ready-to-go command to make a raw disk image in a sparse manner:

# disk.raw doesn't exist before this. Adjust size as needed. G == GiB, GB = GB; see truncate(1)
truncate disk.raw --size=10G

If we're okay depending on GNU truncate (it's already part of coreutils on squeeze), this should be the way to make all raw disks. Equivalents exist via portable-beyond-GNU dd commands and probably via Python logic too.

For shrinking raw disks after zerofree, it's complicated but TL;DR - for now I suggest either GNU cp --sparse=always or maybe GNU cp --sparse=always --reflink=always. You can skip the rest of the mail, but details follow if you want them.

The only solid, longstanding, well-tested options I know of require more build time and disk space than FALLOC_FL_PUNCH_HOLE, but do provide the optimal output on all systems we have to care about. GNU cp --sparse=always is probably the one I know the best (might way to test combining --sparse=always with --reflink=always to see if that reduces needed time/disk space without worsening the outcome). If we ever start depending on the guestfs suite of tools or libraries, virt-sparsify is similar to this GNU cp invocation, but guestfs seems too heavy of a dependency to pull in just for this feature. (guestfs does offer a lot of nice functionality, and it has a Python interface.)

One thing to note about FALLOC_FL_PUNCH_HOLE: while it's by far the most convenient way to do in-place sparsification without needing more disk space, the FALLOC_FL_* flags were only added to the glibc headers in 2.18, so if we do anything with it, those would have to be #defined or equivalent until we stop caring about wheezy, assuming the current sid libc6 makes it into jessie. Wheezy otherwise has the necessary support for ext4 and xfs build filesystems; jessie and wheezy-backports add support for btrfs and tmpfs.

This FALLOC_FL_PUNCH_HOLE-based sparsify tool exists but is GPLv3: https://bitbucket.org/cheater/sparsify (as an independently written optional dependency that might be fine - but it isn't in Debian yet and is very new code). I haven't tested it.

I don't know a nice pre-written solution besides that. Here's the fallocate(2) Linux system call man page documenting FALLOC_FL_PUNCH_HOLE:
http://man7.org/linux/man-pages/man2/fallocate.2.html

So yeah, for doing the shrink part of minimize_size on raw disks, we can either use new, untested but probably working code -- plus some #defines backported from sid glibc -- to do slick in-place wheezy-compatible sparsification, or we can use already-battle-tested code in GNU cp or (less likely) virt-sparsify at the cost of a bit of time and disk space during the build. I say let's start with GNU cp, work to get this new sparsify tool into jessie, and switch to it after wheezy goes out of support.

- Jimmy

Reply to: