[Nbd] question on ioctl NBD_SET_FLAGS

To: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, "qemu-devel@...530..." <qemu-devel@...530...>, qemu block <qemu-block@...530...>
Subject: [Nbd] question on ioctl NBD_SET_FLAGS
From: Eric Blake <eblake@...696...>
Date: Wed, 20 Apr 2016 09:42:02 -0600
Message-id: <5717A34A.2050600@...696...>

NBD commit 39e20207 (2.9.22) introduced the use of the ioctl
NBD_SET_FLAGS, and tries to pass the concatenation of the 16-bit global
flags in the most significant half + the 16-bit transmission flags in
the lower half (during newstyle negotiation).  Of course, at that point
in time, the global flags were ALWAYS 0, so this results in just giving
the kernel the transmission flags.

While oldstyle negotiation gives 32 bits for flags, it never defined any
bits in the upper 16, and is now no longer under development, so no
(working) oldstyle server will ever advertise more than 16 bits of
information, so blindly using the 32 bit flags field to hand to the
kernel via NBD_SET_FLAGS is just fine.

But when commit 626c2a37 (3.1) introduced the first global flag,
NBD_FLAG_FIXED_NEWSTYLE, it computes:

                if(read(sock, &tmp, sizeof(uint16_t)) < 0) {
                        err("Failed reading flags: %m");
                }
                *flags = ((u32)ntohs(tmp));
...
                if(read(sock, &tmp, sizeof(tmp)) < 0)
                        err("Failed/4: %m\n");
                *flags |= (uint32_t)ntohs(tmp);

Whoops - this computes an overlap of flags (it leaves the upper 16 bits
of *flags clear, and sets the lower 16 bits to the bitwise OR of the
global flags and transmission flags) - although no one cared in any
release between there and 3.8, because it so happens that
NBD_FLAG_FIXED_NEWSTYLE and NBD_FLAG_HAS_FLAGS happen to be the same bit.

But in 3.9, the overlap bug was still present, and the set of global
flags had grown to include NBD_FLAG_NO_ZEROS (commit 038e066), which
overlaps with NBD_FLAG_READ_ONLY.  Ouch.  This means that a client
talking to a server that advertises NO_ZEROES means that the client will
mistakenly tell the kernel to treat EVERY export as read-only, even if
the client doesn't respond with NBD_FLAG_C_NO_ZEROES.

3.10 fixed things; negotiate() now uses uint16_t *flags (instead of u32
*), and no longer tries to merge global flags with transmission flags;
only the transmission flags are ever passed to the kernel via
NBD_SET_FLAGS.  Maybe it's good that there was only 2 weeks between 3.9
and 3.10, so hopefully few distros are trying to ship that broken version.

Meanwhile, qemu 2.5 (and probably 2.6, because it's now hard freeze and
too late to change things without good reason) will populate things as:

        if (read_sync(csock, &tmp, sizeof(tmp)) != sizeof(tmp)) {
            error_setg(errp, "Failed to read server flags");
            goto fail;
        }
        *flags = be16_to_cpu(tmp) << 16;
...
        if (read_sync(csock, &tmp, sizeof(tmp)) != sizeof(tmp)) {
            error_setg(errp, "Failed to read export flags");
            goto fail;
        }
        *flags |= be16_to_cpu(tmp);

which means it has been calling the kernel with NBD_SET_FLAGS 0x10001
when talking to the same server where the upstream NBD client would use
merely 0x0001.

I plan on fixing qemu to no longer set the upper 16 bits with global
flags (as I don't see how NBD_FLAG_FIXED_NEWSTYLE or NBD_FLAG_NO_ZEROES
could ever affect what the kernel wants to do in transmission phase).
But do we need to document in the kernel code that existing clients
mistakenly pass too many bits to the NBD_SET_FLAGS ioctl, so that if we
ever reach the future point where we need more than 16 transmission
flags, AND where we have more than 2 global flags defined, existing qemu
2.5 clients don't confuse the kernel when calling NBD_SET_FLAGS?  Or do
we think that it is unlikely enough to worry about, where by the time
there are more than 16 transmission flags, users are likely to already
be using new-enough qemu that doesn't send global flags to the kernel?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

Follow-Ups:
- Re: [Nbd] question on ioctl NBD_SET_FLAGS
  - From: Wouter Verhelst <w@...112...>

Prev by Date: Re: [Nbd] [PATCH v2 1/2] doc: Move sections about structured reply values
Next by Date: Re: [Nbd] [PATCH/RFC] Synchronize the option haggling phase
Previous by thread: Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
Next by thread: Re: [Nbd] question on ioctl NBD_SET_FLAGS
Index(es):
- Date
- Thread