Re: Major large-disk bug for cloud-utils / cloud-initramfs-growroot

On Tue, Oct 21, 2014 at 11:53 PM, Thomas Goirand <zigo@debian.org> wrote:

On 10/22/2014 01:49 PM, Jimmy Kaplowitz wrote:
> In this case, the current behavior is actively destructive to the
> partition table on disks over 2 TB - critical data loss bugs would be RC
> bugs in scope for a stable update, and so are also probably in scope for
> work at this point in the process.

I don't agree with the above. In the cloud case, there's no data loss if
you boot a VM with an image, what happens is that it simply doesn't work
in the worst case, but it will not destroy your HDD image.

It will not destroy the image, true, since the image is a template. However, if any current or future public or private cloud feature can resize a VM instance's disk in-place to larger than 2 TB, even a very plausible offline resize (maybe already supported by qemu/kvm), another growroot attempt would happen and this could destroy the partition table of the resulting VM which already contains data.

I agree it's a tougher question than I initially thought whether this falls under the "data loss" part of the RC bug severities (that would be "grave" if so) - however, see below for why I think this is at least "important" severity and so fixable in Jessie for another several weeks.

> While I agree that maxing out the
> partition at 2 TB is far from ideal, it's much better than the status
> quo of destroying the usability of the disk. We don't have a way to use
> the disk size in deciding whether to install cloud-initramfs-growroot;
> while our images are 10 GB (sparse), they are often used with much
> larger disks.

FYI, the commit that brings the "do not destroy the HDD if it's > 2TB"
goes this way:

diff --git a/bin/growpart b/bin/growpart
index bbac2fe..49a1792 100755
--- a/bin/growpart
+++ b/bin/growpart
@@ -160,6 +160,7 @@ mbr_resize() {
local dump_mod=${TEMP_D}/dump.mod
local tmp="${TEMP_D}/tmp.out"
local err="${TEMP_D}/err.out"
+ local mbr_max_512="4294967296"

local _devc cyl _w1 heads _w2 sectors _w3 tot dpart
local pt_start pt_size pt_end max_end new_size change_info
@@ -174,6 +175,9 @@ mbr_resize() {
tot=$((${cyl}*${heads}*${sectors}))

debug 1 "geometry is ${MBR_CHS}. total size=${tot}"
+ [ "$tot" -gt "$mbr_max_512" ] &&
+ debug 1 "WARN: disk is larger than 2TB. additional space will go unused."
+
rqe sfd_dump sfdisk ${MBR_CHS} --unit=S --dump "${DISK}" \
>"${dump_out}" ||
fail "failed to dump sfdisk info for ${DISK}"
@@ -217,6 +221,10 @@ mbr_resize() {
[ -n "${max_end}" ] ||
fail "failed to get max_end for partition ${PART}"

+ if [ "$max_end" -gt "$mbr_max_512" ]; then
+ max_end=$mbr_max_512;
+ fi
+
debug 1 "max_end=${max_end} tot=${tot} pt_end=${pt_end}" \
"pt_start=${pt_start} pt_size=${pt_size}"
[ $((${pt_end})) -eq ${max_end} ] &&

In the current version of cloud-utils, there's no such thing as
"max_end", and the way to patch really isn't trivial (please read the
source code and try to make sense out of it, then you tell me what you
think...).

I'd be fine with a patch against the current version.

If you think you can do it, then please provide a patch against the
current version of cloud-utils. If you can't, then please don't open an
RC bug for it.

The bug is at minimum of important severity: "a bug which has a major effect on the usability of a package, without rendering it completely unusable to everyone." Unblock requests for bugfixes of this severity in packages optional and extra which can be fixed via unstable are accepted through December 5 according to item 'ii' of the freeze policy. It's certainly not too late in the cycle for this.

> It's already a common customer desire on GCE to want disks over 2 TB,
> such that we have it in our test suite (which caught this issue). During
> the Jessie lifecycle, this need will grow across all environments.

Well, once more, you will *not* be able to use a 2TB disk by backporting
the above patch. What we need is the latest upstream release which
supports GPT, and we need that to be uploaded tomorrow to reach Jessie
before the freeze.

Sure you'll be able to use a disk of any size. What you won't be able to do is have the partition or filesystem automatically work with the entire disk when that exceeds 2 TB, but from a 10 GB image it'll still use ~2TB total of even a 10 TB disk, instead of blocking boot entirely and leaving a ~1 GB partition size which is unusably and horribly mismatched with the on-disk ext4 superblock metadata. Yes, that's what happens in our testing with a 2049 GB disk.

> There's no way we can safely use cloud-initramfs-growroot with the
> current cloud-utils in stock GCE images

There's a way: do not use a disk bigger than 2TB.

We have no good way to enforce that - customers often have good reason for trying larger disks, even if not all the space is usable, given GCE's size-based disk performance model. For many users, partition + filesystem of ~2TB is enough and the extra space is still helpful for performance reasons. Users who end up needing to use the rest of the disk can use https://packages.debian.org/search?keywords=gdisk to convert to GPT until Debian cloud images adopt the GPT format out of the box, but only if they can get access to the resulting VM instance. If the instance becomes unbootable before user-supplied startup code can run, that's impossible.

> and other clouds which offer
> large physical disks should probably be cautious in the same way. If
> adopting upstream's fix is deemed infeasible for Jessie and
> wheezy-backports, at least put in a minimal equivalent fix of your own
> design where it maxes out at 2 TB. somewhat sane and
> non-destructive is better than automated accidental destruction.

I agree. Though again, that's not an easy fix. Let me know if you think
you can do it yourself. Also, I would welcome if you could test latest
upstream release ASAP, and let us know if it works well enough. In such
case, I may reconsider the upgrade.

If you agree with my conclusion above that this is at least severity important, maybe we could handle this in a less rushed but still speedy manner over the upcoming weeks. While I can't promise a test of newer upstream cloud-utils tomorrow, I probably can promise to test it soon.

- Jimmy