[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension



29.11.2016 15:57, Alex Bligh wrote:
Vladimir,

I went back to April to reread the previous train of conversation
then found you had helpfully summarised some if it. Comments
below.

Rather than comment on many of the points individual, the root
of my confusion and to some extent uncomfortableness about this
proposal is 'who owns the meaning the the bitmaps'.

Some of this is my own confusion (sorry) about the use to which
this is being put, which is I think at root a documentation issue.
To illustrate this, you write in the FAQ section that this is for
read only disks, but the text talks about:

+Some storage formats and operations over such formats express a
+concept of data dirtiness. Whether the operation is block device
+mirroring, incremental block device backup or any other operation with
+a concept of data dirtiness, they all share a need to provide a list
+of ranges that this particular operation treats as dirty.

How can data be 'dirty' if it is static and unchangeable? (I thought)

I now think what you are talking about backing up a *snapshot* of a disk
that's running, where the disk itself was not connected using NBD? IE it's
not being 'made dirty' by NBD_CMD_WRITE etc. Rather 'dirtiness' is effectively
an opaque state represented in a bitmap, which is binary metadata
at some particular level of granularity. It might as well be 'happiness'
or 'is coloured blue'. The NBD server would (normally) have no way of
manipulating this bitmap.

Yes, something like this.

Note: in Qemu it may not be a snapshot (actually, I didn't see a way in Qemu to export snapshots not switching to them (except opening external snapshot as a separate block dev)), but just any read-only drive, or temporary drive, used in the image fleecing scheme:

driveA is online normal drive
driveB is empty nbd export

- start backup driveA->driveB with sync=none, which means that the only copy which is done is copying old data from A to B before every write to A - and set driveA as backing for driveB (on read from B, if data is not allocated it will be read from A)

after that, driveB is something like a snapshot for backup through NBD.

It's all just to say: calling backup-point-in-time-state just a 'snapshot' confuses me and I hope not to see this word in the spec (in this context).


In previous comments, I said 'how come we can set the dirty bit through
writes but can't clear it?'. This (my statement) is now I think wrong,
as NBD_CMD_WRITE etc. is not defined to set the dirty bit. The
state of the bitmap comes from whatever sets the bitmap which is outside
the scope of this protocol to transmit it.

However, we have the uncomfortable (to me) situation where the protocol
describes a flag 'dirty', with implications as to what it does, but
no actual strict definition of how it's set. So any 'other' user has
no real idea how to use the information, or how to implement a server
that provides a 'dirty' bit, because the semantics of that aren't within
the protocol. This does not sit happily with me.

So I'm wondering whether we should simplify and generalise this spec. You
say that for the dirty flag, there's no specification of when it is
set and cleared - that's implementation defined. Would it not be better
then to say 'that whole thing is private to Qemu - even the name'.

Rather you could read the list of bitmaps a server has, with a textual
name, each having an index (and perhaps a granularity). You could then
ask on NBD_CMD_BLOCK_STATUS for the appropriate index, and get back that
bitmap value. Some names (e.g. 'ALLOCATED') could be defined in the spec,
and some (e.g. ones beginning with 'X-') could be reserved for user
usage. So you could use 'X-QEMU-DIRTY'). If you then change what your
dirty bit means, you could use 'X-QEMU-DIRTY2' or similar, and not
need a protocol change to support it.

IE rather than looking at 'a way of reading the dirty bit', we could
have this as a generic way of reading opaque bitmaps. Only one (allocation)
might be given meaning to start off with, and it wouldn't be necessary
for all servers to support that - i.e. you could support bitmap reading
without having an ALLOCATION bitmap available.

This spec would then be limited to the transmission of the bitmaps
(remove the word 'dirty' entirely, except perhaps as an illustrative
use case), and include only the definition of the allocation bitmap.

Good point. For Qcow2 we finally come to just bitmaps, not "dirty bitmaps", to make it more general. There is a problem with allocation if we want to make it a subcase of bitmap: allocation natively have two bits per block: zero and allocated. We can of course separate this into two bitmaps, but this will not be similar with classic get_block_status.

Some more nits:

Also, bit of NBD_FLAG_SEND_BLOCK_STATUS is changed to 9, as 8 is now
NBD_FLAG_CAN_MULTI_CONN in master branch.

And, finally, I've rebased this onto current state of
extension-structured-reply branch (which itself should be rebased on
master IMHO).
Each documentation branch should normally be branched off master unless
it depends on another extension (in which case it will be branched from that).
I haven't been rebasing them frequently as it can disrupt those working
on the branches. There's only really an issue around rebasing where you
depend on another branch.


2. Q: different granularities of dirty/allocated bitmaps. Any problems?
   A: 1: server replies with status descriptors of any size, granularity
         is hidden from the client
      2: dirty/allocated requests are separate and unrelated to each
         other, so their granularities are not intersecting
I'm OK with this, but note that you do actually mention a granularity
of sorts in the spec (512 byes) - I think you should replace that
with the minimum block size.

3. Q: selecting of dirty bitmap to export
   A: several variants:
      1: id of bitmap is in flags field of request
          pros: - simple
          cons: - it's a hack. flags field is for other uses.
                - we'll have to map bitmap names to these "ids"
      2: introduce extended nbd requests with variable length and exploit this
         feature for BLOCK_STATUS command, specifying bitmap identifier.
         pros: - looks like a true way
         cons: - we have to create additional extension
               - possible we have to create a map,
                 {<QEMU bitmap name> <=> <NBD bitmap id>}
      3: exteranl tool should select which bitmap to export. So, in case of Qemu
         it should be something like qmp command block-export-dirty-bitmap.
         pros: - simple
               - we can extend it to behave like (2) later
         cons: - additional qmp command to implement (possibly, the lesser evil)
         note: Hmm, external tool can make chose between allocated/dirty data too,
               so, we can remove 'NBD_FLAG_STATUS_DIRTY' flag at all.
Yes, this is all pretty horrible. I suspect we want to do something like (2),
and permit extra data across (in my proposal, it would only be one byte to select
the index). I suppose one could ask for a list of bitmaps.

4. Q: Should not get_{allocated,dirty} be separate commands?
   cons: Two commands with almost same semantic and similar means?
   pros: However here is a good point of separating clearly defined and native
         for block devices GET_BLOCK_STATUS from user-driven and actually
         undefined data, called 'dirtyness'.
I'm suggesting one generic 'read bitmap' command like you.

To support get_block_status in this general read_bitmap, we will need to define something like 'multibitmap', which allows several bits per chunk, as allocation data has two: zero and allocated.


5. Number of status descriptors, sent by server, should be restricted
   variants:
   1: just allow server to restrict this as it wants (which was done in v3)
   2: (not excluding 1). Client specifies somehow the maximum for number
      of descriptors.
      2.1: add command flag, which will request only one descriptor
           (otherwise, no restrictions from the client)
      2.2: again, introduce extended nbd requests, and add field to
           specify this maximum
I think some form of extended request is the way to go, but out of
interest, what's the issue with as many descriptors being sent as it
takes to encode the reply? The client can just consume the remainder
(without buffering) and reissue the request at a later point for
the areas it discarded.

the issue is: too many descriptors possible. So, (1) solves it. (2) is optional, just to simplify/optimize client side.


6. A: What to do with unspecified flags (in request/reply)?
   I think the normal variant is to make them reserved. (Server should
   return EINVAL if found unknown bits, client should consider replay
   with unknown bits as an error)
Yeah.

+
+* `NBD_CMD_BLOCK_STATUS`
+
+    A block status query request. Length and offset define the range
+    of interest. Clients SHOULD NOT use this request unless the server
MUST NOT is what we say elsewhere I believe.

+    set `NBD_CMD_SEND_BLOCK_STATUS` in the transmission flags, which
+    in turn requires the client to first negotiate structured replies.
+    For a successful return, the server MUST use a structured reply,
+    containing at most one chunk of type `NBD_REPLY_TYPE_BLOCK_STATUS`.
Nit: are you saying that non-structured error replies are permissible?
You're always/often going to get a non-structured  (simple) error reply
if the server doesn't support the command, but I think it would be fair to say the
server MUST use a structured reply to NBD_CMD_SEND_BLOCK_STATUS if
it supports the command. This is effectively what we say re NBD_CMD_READ.

I agree.


+
+    The list of block status descriptors within the
+    `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
+    of the file starting from specified *offset*, and the sum of the
+    *length* fields of each descriptor MUST not be greater than the
+    overall *length* of the request. This means that the server MAY
+    return less data than required. However the server MUST return at
+    least one status descriptor
I'm not sure I understand why that's useful. What should the client
infer from the server refusing to provide information? We don't
permit short reads etc.

if the bitmap is 010101010101 we will have too many descriptors. For example, 16tb disk, 64k granularity -> 2G of descriptors payload.


.  The server SHOULD use different
+    *status* values between consecutive descriptors, and SHOULD use
+    descriptor lengths that are an integer multiple of 512 bytes where
+    possible (the first and last descriptor of an unaligned query being
+    the most obvious places for an exception).
Surely better would be an an integer multiple of the minimum block
size. Being able to offer bitmap support at finer granularity than
the absolute minimum block size helps no one, and if it were possible
to support a 256 byte block size (I think some floppy disks had that)
I see no reason not to support that as a granularity.


Agree. Anyway it is just a strong recommendation, not requirement.


--
Best regards,
Vladimir




Reply to: