[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extension



On Mon, Apr 04, 2016 at 05:32:34PM -0600, Eric Blake wrote:
> On 04/04/2016 05:08 PM, Wouter Verhelst wrote:
> > On Mon, Apr 04, 2016 at 10:54:02PM +0300, Denis V. Lunev wrote:
> >> saying about dirtiness, we would soon come to the fact, that
> >> we can have several dirtiness states regarding different
> >> lines of incremental backups. This complexity is hidden
> >> inside QEMU and it would be very difficult to publish and
> >> reuse it.
> > 
> > How about this then.
> > 
> > A reply to GET_BLOCK_STATUS containing chunks of this:
> > 
> > 32-bit length
> > 32-bit "snapshot status"
> > if bit 0 in the latter field is set, that means the block is allocated
> >   on the original device
> > if bit 1 is set, that means the block is allocated on the first-level
> >   snapshot
> > if bit 2 is set, that means the block is allocated on the second-level
> >   snapshot
> 
> The idea of allocation is orthogonal from the idea of reads as zeroes.
> That is, a client may usefully guarantee that something reads as zeroes,
> whether or not it is allocated (but knowing whether it is a hole or
> allocated will determine whether future writes to that area will cause
> file system fragmentation or be at risk of ENOSPC on thin-provisioning).
>  If we want to expose the notion of depth (and I'm not sure about that
> yet), we may want to reserve bit 0 for 'reads as zero' and bits 1-30 as
> 'allocated at depth "bit-1"' (and bit 31 as 'allocated at depth 30 or
> greater).
> 
> I don't know if the idea of depth of allocation is useful enough to
> expose in this manner; qemu could certainly advertise depth if the
> protocol calls it out, but I'm still not sure whether knowing depth
> helps any algorithms.
> 
> > 
> > If all flags are cleared, that means the block is not allocated (i.e.,
> > is a hole) and MUST read as zeroes.
> 
> That's too strong.  NBD_CMD_TRIM says that we can create holes whose
> data does not necessarily read as zeroes (and SCSI definitely has
> semantics like this - not all devices guarantee zero reads when you
> UNMAP; and WRITE_SAME has an UNMAP flag to control whether you are okay
> with the faster unmapping operation at the expense of bad reads, or
> slower explicit writes).  Hence my complaint that we have to treat
> 'reads as zero' as an orthogonal bit to 'allocated at depth X'.

Right.

It would be possible to instead mark bit 0 as NBD_ALLOC_LEVEL_DATA (i.e.,
"not just zeroes" -- since I believe a flag value of "one" to indicate
"zeroes" is bound to cause confusion), bit 1 as NBD_ALLOC_LEVEL_CURRENT
(i.e., "allocated in the current snapshot level"), and bits 2 through 31
as NBD_ALLOC_LEVEL_2 through NBD_ALLOC_LEVEL_31, with
implementation-defined meanings, as above.

> > If a flag is set at a particular level X, that means the device is dirty
> > at the Xth-level snapshot.
> > 
> > If at least one flag is set for a region, that means the data may read
> > as "not zero".
> > 
> > The protocol does not define what it means to have multiple levels of
> > snapshots, other than:
> > 
> > - Any write command (WRITE or WRITE_ZEROES) MUST NOT clear or set the
> >   Xth level flag if the Yth level flag is not also cleared at the same
> >   time, for any Y > X
> > - Any write (as above) MAY set or clear multiple levels of flags at the
> >   same time, as long as the above holds

Having thought about this on my way to work, I'm now no longer convinced
this is a good idea. I could imagine several scenarios where violating
these constraints might be useful, and none where not violating them
would buy us anything.

Instead, I suggest the following semantics:

- NBD_CMD_WRITE to a region which has NBD_ALLOC_LEVEL_DATA cleared MUST
  set that bit for that region
- NBD_CMD_WRITE_ZEROES or NBD_CMD_TRIM to a region which has
  NBD_ALLOC_LEVEL_DATA set MAY clear that bit for that region
- NBD_CMD_TRIM to a region which has one of the NBD_ALLOC_LEVEL_X flags
  set MAY just clear those flags (and no other). In that case, the
  contents of that region (when read) SHOULD be assumed to have changed,
  but is not defined, and is unlikely to read as zeroes if other
  NBD_ALLOC_LEVEL_* flags are still set.
- NBD_CMD_WRITE to a region which has NBD_ALLOC_LEVEL_CURRENT set SHOULD
  NOT fail with ENOSPC, although it MAY still fail with other errors.
- NBD_CMD_LEVEL_READ from a region which has NBD_ALLOC_LEVEL_DATA clared
  MUST read as zeroes

This may all be a bit overengineered, but it has the benefit of
being fairly generic. It also means that if the flags field is
all-zeroes, the region is "uninteresting".

Thoughts?

[...]

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Reply to: