Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE

To: Wouter Verhelst <w@...112...>
Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>
Subject: Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
From: Alex Bligh <alex@...872...>
Date: Tue, 26 Apr 2016 18:10:46 +0100
Message-id: <97B91103-CE3B-4661-B528-85E45CCCC875@...872...>
In-reply-to: <20160426103031.GA13582@...3...>
References: <1461604203-63003-1-git-send-email-alex@...872...> <20160426083013.GA3624@...3...> <FC22BF52-F390-475C-A170-AA2B6599FAD5@...872...> <20160426103031.GA13582@...3...>
Wouter,

>> However, in this instance the block sizing info comes through (and
>> *only* comes through) NBD_OPT_INFO. So in order to support block sizes
>> you need to support NBD_OPT_INFO because there is no other way to
>> obtain the block sizes anyway (*). The converse is really what we
>> are arguing about. Can you legitimately support NBD_OPT_INFO (as a
>> client) then ignore block sizes?
> 
> Right, that's the argument I think matters most. I think that yes, you
> can. We provide more information about exports than *just* the block
> sizes they support (the export size, whether the export requires TLS,
> possibly whether future extensions are supported as well), and there is
> no reason to require that this one arbitrary bit of information is
> assumed to be abided by just because you asked for general information.

I think that's a fair enough point.

>> I'm arguing that we can legitimately make block size support part of the info
>> spec, and say if you ask for info, you must be prepared to deal with it. The
>> alternate view is that info and blocksizes are different things, in which
>> case the server needs some way of knowing whether the client will respect
>> block sizes if given.
> 
> Right. I think they are different things. They look similar today,
> because we mostly add INFO because of the block sizes -- but we also add
> it for STARTTLS, and a client which doesn't care about block sizes but
> which does care about STARTTLS should not have to suddenly worry about
> block-aligning requests, etc.

OK. I suppose my counterargument would be 'but blocksize constraints
exist today, just like STARTTLS did before it was a formal option'.
However, I agree the asymmetry of treatment of BLOCKSIZE and new bits
is not good.

>> An alternative here would be to take a leaf out of DHCP's (dubious)
>> book. When asking for NBD_OPT_INFO you'd supply a list of extensions
>> you understanderstand (i.e. a list of the NBD_REP_INFO codes), and
>> you only get sent back the ones you ask for. The fact you've explicitly
>> asked for it CAN be taken to mean you've understood it.
> 
> Yes; that makes a lot more more sense. I had in fact been thinking about
> something along those lines, just hadn't suggested it yet.

OK. I will knock something up.

>> I guess up to this point in the email, we are talking about how best
>> for a client to signal to the server that it can respect the
>> server's block size constraints. I proposed but then decided I
>> didn't much like NBD_OPT_BLOCK_SIZE. There's another idea above.
> 
> I haven't really understood why you dislike it, but perhaps that's just
> me.

A few reasons:

* Every further info type necessarily needs a specific new option with
  its own custom documentation

* There is an ordering issue (less so now the option haggling phase
  is synchronous). EG I might want to tell a client that supports
  blocksize constraints that my minimum block size is 512 bytes, but
  tell one that is 'only asking for info' that the minimum block size is
  1. What happens if NBD_OPT_BLOCK_SIZE comes after the NBD_OPT_INFO
  request? I suspect the answer is 'well then, things have changed
  so I can reply differently with NBD_OPT_GO'.

* Something in me feels the NBD_OPT stuff is asking the server
  about what it does, rather than the client telling the server
  about what it does. You almost want the server to be able to
  ask the client things. This might be just in my head.
 
>> Yet another idea would be a single 'I understand block sizes'
>> bit in the flags the client sends right at the start of the negotiation
>> (perhaps mirrored by a server bit - I have something useful to
>> say about block sizes); I think I'd prefer that.
> 
> I don't. There's no need to burn flags for this.

Yeah, true.

>> But we have more contentiousness in the following.
> 
> Not really.

I meant I disagreed more about what you wrote below this point.

>>> In addition, while I understand the reasoning behind announcing block
>>> sizes, I do not want to make "abiding by announced block sizes" a
>>> prerequisite of "using the more modern way of choosing an export". The
>>> protocol currently has no requirement for a server to use a particular
>>> block size, and I think this is a feature. Sure, most clients *do*
>>> currently request data in block-sized units, but that doesn't mean we
>>> need to make that a requirement. A simple user-space client which wants
>>> to inspect (and possibly modify) some parts of an export might not care
>>> about block sizes; we should make writing such clients not harder than
>>> needs be.
>> 
>> Well, actually every server I know of ALREADY has block size constraints.
>> nbd-server.c is weakest, in that it won't support operations >= 1>>31
>> bytes. Qemu will fail stuff >32Mb. gonbdserver.c fails on stuff >128Mb
>> (and I won't be removing that - the DoS opportunity is too great).
>> Many servers perform substantially better if reads/writes are aligned.
> 
> Sure, but that's not my point.
> 
> A server is already allowed to reject 'too large' requests, even if
> block sizes aren't negotiated. That's fine, and we should keep that.

They aren't allowed to per the standard, but they do.

> However, servers *aren't* currently allowed to reject requests that are
> too small, or that are not block-aligned.

They aren't allowed to per the standard, but they do. I don't see
the difference.

> I think that's the only case
> in which NBD_REP_ERR_BLOCK_SIZE_REQD should be sent; it signals that the
> server can't support a certain type of request, and that the client
> should be prepared for that.

Right. So if you take the case of a server that can't (because it
is impossible / impractical to do so, it carries other disadvantages,
or the author is lazy, or whatever) support block writes < 512 bytes
(say), that's no different from one that can't (because it is impossible
/ impractical to do so, or because the author is lazy or whatever)
support >32Mb (cough, qemu) or >128Mb (cough, gonbdserver) or >allocatable
memory (cough, nbdserver.c) requests.

The way the block size stuff is currently written, there are a default
set of block constraints - for these purposes the minimum size of 1
byte and the maximum size of 0xffffffff are the only thing that
matters. If you support those, you shouldn't send
NBD_REP_ERR_BLOCK_SIZE_REQD (currently the text is slightly wonky
as it says you shouldn't send it if you can agree them externally,
whereas obviously if they are the default you don't need to
agree them externally - I will fix that). IE you should only
be sending NBD_REP_ERR_BLOCK_SIZE_REQD if you can't tell the
client that you don't support the block sizes it is going
to expect (in the absence of asking you).

> Block-aligning requests on the client side may be a bit much work
> though, and some clients may not have the ability to abide by that
> request. Therefore, if yoyu're going to require block-aligned requests,
> you're effectively passing work to clients.

Well that's one way of looking at it. Another is you are otherwise
requiring the server to do it. In the general case I suspect
it's easier on the client, because most client stacks (kernel,
qemu being my 2 examples) are already capable of dealing with
backends that have fixed block sizes (normal block devices, hard
disks etc.) so will already do this.

>> If you write a server which has arcane block size constraints it
>> will fail if some dumb clients talk to it. I don't think that's an
>> issue. It's better they fail early than try to connect and fail on
>> the 37th write.
>> 
>>> In that same light, I think we should discourage (but allow) the use of
>>> NBD_REP_ERR_BLOCK_SIZE_REQD (although we might want to make that error's
>>> name a bit shorter -- boy that's a lot of typing) in the same way that
>>> we currently allow but discourage ENOMEM.
>> 
>> Don't agree.
>> 
>> If don't support block sizes under 4096 bytes (say) it's pretty
>> non-negotiable, or you risk something that fails obscurely later.
> 
> Sure. My point isn't that you shouldn't announce that; it's that you
> should add code that makes it possible for you to talk to older clients,
> even if that requires a performance cost.

So this is my area of disagreement. I have no problem adding code.
However, I don't believe there are (really) older clients that rely
on this and if it made my life 1% harder to support them, I wouldn't
bother and don't see why I should (in gonbdserver's case
it turns out to be easy - so far - to support unaligned requests
and minimum write size, but difficult for complicated reasons
to support requests greater than a maximum size - difficult in 
the sense of 'to support arbitrary sizes, I would need to change
the threading and memory management model significantly'). Actually
nbd-server.c is in exactly the same position (and would be more
so if you put in some DoS protection!)

But despite the fact it's easy for gonbdserver, it wouldn't be
easy for a server that uses mmap, O_DIRECT, or various other
things. These are reasonable things for a server author to
want to do. I think all (or nearly all) reasonable clients end up
with the ability to support devices with fixed block size anyway.
So it's reasonable for a server author to say 'you know what,
if a client can't deal with my blocksize constraints, I want to
cleanly refuse a connection' (which is what
NBD_REP_ERR_BLOCK_SIZE_REQD is for). I think you agree with this,
but ...

>> Much better to fail at negotiation stage if you really can't
>> support (or can't risk pretending to support) the client.
>> 
>> If what you are saying is 'it is unreasonable to use the nbd
>> protocol unless you can always somehow support byte-aligned
>> writes' I don't really agree.
> 
> I'm not saying that.
> 
> A small server could certainly use NBD_REP_ERR_BLOCK_SIZE_REQD and
> refuse to talk to clients who don't (announce they) know how to
> block-align requests. There's nothing wrong with that.
> 
> However, a server written for full interoperability and maximum
> usefulness should not do so. It may issue a warning that it will be
> slower, but it should be able to operate at a basic level.

... you are characterising such a server as a server with
less than full interoperability. I don't agree. It's a perfectly
reasonable server choice. In fact I'd say 'not supporting block
sizes is an issue with client interoperability, not a
server issue'.

NBD is fundamentally to me a block device, not a file seeking
device (the clue being in the name). If a hard disk vendor said
"well, I'm sorry our disk only supports reads and writes by
whole sectors", the response "well that's not very interoperable"
would not be seen as sensible. Nor would a 'iSCSI vendors
SHOULD support byte-wise writes to block devices'. However,
if we found in the iSCSI protocol 'clients MUST respect block
sizes between X and Y as returned by the iSCSI target' we'd
be pretty unsurprised.

That's the bit I think you may have backwards.

> For that reason, I think we should add some language to discourage the
> use of that option, with the understanding that "discourage" does not
> mean "forbid".

If it was me, I would make it the other way around, that clients
SHOULD ask for blocksize info and respect it. And were it not for
the fact we haven't had blocksize in there from day 1, I'd make
it a 'MUST' (obviously that's impractical now).

>> Just as you are arguing it should be possible to write a simple
>> client, I'd argue it should be possible to write a simple server.
> 
> And I agree with that :-)
> 
> Perhaps we should add a section to the spec outlining requirements for
> "minimum" implementations that are still interoperable with other
> implementations...?

Mmmm.... well I think RFC2119 wording mostly does that. I guess
one general problem is that we are now adding a pile of options
and it's not obvious what the consequences of not supporting
them are. Let's take an uncontroversial one like STARTTLS.
When we put it in, we made sure we did not make NBD_OPT_STARTTLS
mandatory to support either side (we said the client MAY send
it and the server can operate in one of 3 modes, in one of which
it can refuse to process the option, which is what nbd-server.c
does today); so that in a sense is easy, but only came after
a lot of word wrangling. What about a simpler extension like
NBD_CMD_WRITEZEROES? Assuming it's been promoted from an
extension, what status has it got? Of course you can not send
NBD_FLAG_SEND_WRITEZEROES which is what older servers will do,
but is is a server that doesn't support it (assuming it is
promoted) compliant with the *current* spec? IE is
NBD_FLAG_SEND_WRITEZEROES there for back compatibility, or
is there a 'new NBD protocol version' which is back compatible,
but to be compatible with the *new* version, you need to
implement certain changes? Certainly I feel that should be the
case with (e.g.) fixed-Newstyle. Perhaps we need an 'NBD protocol
version' that says what features people are expected to implement
(one side or the other).

Incidentally, I'd love to move everything bar fixed-Newstyle out to
a section marked 'deprecated - historical interest only - if you
want to interoperate with these servers/clients here's what you
might find'.

-- 
Alex Bligh
Reply to:
Follow-Ups:
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Wouter Verhelst <w@...112...>
References:
- [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Alex Bligh <alex@...872...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Alex Bligh <alex@...872...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Wouter Verhelst <w@...112...>
Prev by Date: Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
Next by Date: [Nbd] [PATCH/RFCv3] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO
Previous by thread: Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
Next by thread: Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
Index(es):
- Date
- Thread