Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE

To: Wouter Verhelst <w@...112...>
Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>
Subject: Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
From: Alex Bligh <alex@...872...>
Date: Thu, 28 Apr 2016 08:28:55 +0100
Message-id: <76C067F9-2C02-4671-A21F-3A1AAE889F1B@...872...>
In-reply-to: <20160427120459.GA17932@...3...>
References: <1461604203-63003-1-git-send-email-alex@...872...> <20160426083013.GA3624@...3...> <FC22BF52-F390-475C-A170-AA2B6599FAD5@...872...> <20160426103031.GA13582@...3...> <97B91103-CE3B-4661-B528-85E45CCCC875@...872...> <20160427120459.GA17932@...3...>

Wouter,

>> * Something in me feels the NBD_OPT stuff is asking the server
>>  about what it does, rather than the client telling the server
>>  about what it does. You almost want the server to be able to
>>  ask the client things. This might be just in my head.
> 
> I think it is :-)
> 
> The option haggling was always intended to be a two-way conversation,
> with the client setting options and the server (possibly) returning
> information. This falls well within that.

Fair enough!

>>>> But we have more contentiousness in the following.
>>> 
>>> Not really.
>> 
>> I meant I disagreed more about what you wrote below this point.
> 
> and I meant to say that what I wrote below this point is less important
> to me than the bit before, and that I'm more likely to just drop that
> bit if needs be :-)

OK. Can I suggest you take a look at patch v5 just sent to the list?
Because though we still disagree on the bits below here, the bit
we've both identified as a problem we now sort of agree on, and
I the v5 way of doing something should be no worse from your point
of view than the current proposal. Currently you send NBD_OPT_BLOCK_SIZE
to indicate you understand and respect block sizes; now you ask for
block sizes to indicate you understand and respect block sizes.
It's a more generalisable approach.

>>> A server is already allowed to reject 'too large' requests, even if
>>> block sizes aren't negotiated. That's fine, and we should keep that.
>> 
>> They aren't allowed to per the standard, but they do.
> 
> Yes, that.
> 
>>> However, servers *aren't* currently allowed to reject requests that are
>>> too small, or that are not block-aligned.
>> 
>> They aren't allowed to per the standard, but they do. I don't see
>> the difference.
> 
> That the overflow is far less likely to happen than the underflow,
> especially if the minimum size is set to something > 512.

I think we disagree here. I've seen actual overflow problems
(mainly running integrity tests, admittedly), but every
client I know has a block size even if negotiated with the
server.

But even if you're right, I'd argue this is in part because
we've made it very hard to write servers that require larger
block sizes (e.g. mmap() / O_DIRECT based ones).

>>> Block-aligning requests on the client side may be a bit much work
>>> though, and some clients may not have the ability to abide by that
>>> request. Therefore, if yoyu're going to require block-aligned requests,
>>> you're effectively passing work to clients.
>> 
>> Well that's one way of looking at it. Another is you are otherwise
>> requiring the server to do it.
> 
> Yes, but the server is far less likely to be in a minimal environment
> where "doing a lot of work" is Hard(tm) than the client is.

You probably have more visibility of clients than I do. I'm
familiar with the linux kernel (where we have an explicit
ioctl() to set block size, and the kernel is capable of working
with any reasonable power of two as a block size),  and qemu
(which can cope with any block size offered by the server, or rather
could do if only it had a way to ascertain it). I'm familiar
with early boot environments like gPXE which all do block-by-block
loading (e.g. initrd over http) and whilst as far as I know
these don't support nbd, doing so with any readable block size
would be considerably easier than supporting the complexity
of http chunking etc. Which clients were you thinking of?

>>> However, a server written for full interoperability and maximum
>>> usefulness should not do so. It may issue a warning that it will be
>>> slower, but it should be able to operate at a basic level.
>> 
>> .... you are characterising such a server as a server with
>> less than full interoperability. I don't agree. It's a perfectly
>> reasonable server choice. In fact I'd say 'not supporting block
>> sizes is an issue with client interoperability, not a
>> server issue'.
> 
> The problem is that current clients, which don't know about block sizes,
> will be *completely* unable to speak with a server which *requires*
> block size negotiation. A server which does not *require* such
> negotiation (but prefers it) will be able to talk to such a client. As
> such, a server which sends BLOCK_SIZE_REQD is less than fully
> interoperable.
> 
> It's okay (in my book) for a server to perform less than optimally with
> older clients, but it's *not* okay for a server to refuse to talk to
> older clients (if we can avoid it).

OK, so that's a point of disagreement then.

Just as I think it is legitimate to write and deploy a server
that only does fixed Newstyle (despite broken-newstyle
and oldstyle still being in the standard), and just as I think
it's legitimate to write a server that doesn't support TLS or
only supports TLS, I think it's legitimate to write a server
that doesn't support old clients. *However* I think people
should do this only consciously and for good reason (i.e. because
they want to take advantage of O_DIRECT or whatever) AND they
should fail cleanly (i.e. not half work then error in
transmission stage).

If you don't agree with this principle, you should logically
be disagreeing with ANY minimum block size (and indeed
ANY maximum block size apart from 2^32-1), because an existing
client can legitimately send 1 byte or 2^32-1 byte requests,
and you'd be saying that no server can legitimately refuse
those requests if the client does not support whatever
block size extension. In fact such constraints already
exist, and are useful both to the server, and (ultimately)
to the client as it can tell the server it can run faster.

>> NBD is fundamentally to me a block device, not a file seeking
>> device (the clue being in the name). If a hard disk vendor said
>> "well, I'm sorry our disk only supports reads and writes by
>> whole sectors", the response "well that's not very interoperable"
>> would not be seen as sensible. Nor would a 'iSCSI vendors
>> SHOULD support byte-wise writes to block devices'. However,
>> if we found in the iSCSI protocol 'clients MUST respect block
>> sizes between X and Y as returned by the iSCSI target' we'd
>> be pretty unsurprised.
>> 
>> That's the bit I think you may have backwards.
> 
> No, I think you misunderstand what I want to see happen here.
> 
>>> For that reason, I think we should add some language to discourage the
>>> use of that option, with the understanding that "discourage" does not
>>> mean "forbid".
>> 
>> If it was me, I would make it the other way around, that clients
>> SHOULD ask for blocksize info and respect it. And were it not for
>> the fact we haven't had blocksize in there from day 1, I'd make
>> it a 'MUST' (obviously that's impractical now).
> 
> Yes, clients clearly should ask for the information. My point is that a
> server which sees a client which doesn't ask for it, should be written
> so that it can still talk with said client (which presumably is an older
> client).

And again, I disagree here. In my view it should either be written
to talk to said client, or error cleanly.

So two questions:

1. If all servers should be written such that if the client
   does not understand minimum / maximum block sizes, the server
   should support all block size requests from 1 to 2^32-1 inclusive,
   what would be the point of minimum and maximum block sizes?

2. Let's assume you want to write a server which runs using
   O_DIRECT and sendfile. You know that the clients you are interested
   in addressing are those that support block sizes (linux kernel,
   I suspect every other OS kernel, and qemu). Supporting sub 4k
   reads and writes is a very substantial piece of extra complexity
   which would introduce entirely new code paths (unless you can
   see a way around that). What would you advise such an author
   to do? Write all this extra code without any known clients to
   test against, and carry the burden of complex code forward?
   Error clients that might not respect those block size constraints
   cleanly before entering transmission mode? Or connect and hope
   for the best, erroring the first non-conformant request? If you
   chose the second or third, would you describe the server as
   non-conformant?

And a last point: you seem to be OK about clients that look at
the capabilities of a server and won't connect if they don't
like them (for instance a client that won't connect to a
server that doesn't support TLS). Why is it so bad to have a
server that won't accept a connection from a client if the client
does not support certain protocol feature?

-- 
Alex Bligh

Reply to:

References:
- [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Alex Bligh <alex@...872...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Alex Bligh <alex@...872...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Alex Bligh <alex@...872...>
- Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
  - From: Wouter Verhelst <w@...112...>

Prev by Date: [Nbd] [PATCH/RFCv5] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO
Next by Date: Re: [Nbd] [PATCH/RFCv4] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO
Previous by thread: Re: [Nbd] [PATCH/RFCv2] Remove NBD_OPT_BLOCK_SIZE
Next by thread: [Nbd] [PATCH/RFCv3] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO
Index(es):
- Date
- Thread