[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH v2] doc: Add new NBD_REP_INFO reply, for advertising block size



On 04/13/2016 01:27 AM, Wouter Verhelst wrote:
> Eric,
> 
> On Tue, Apr 12, 2016 at 06:16:54PM -0600, Eric Blake wrote:
>> Existing NBD servers often have limitations, such as requiring
>> actions to be aligned to block sizes or limiting maximum
>> transactions to avoid denial of service attacks; for example,
>> qemu's NBD server refuses any transaction larger than 32M.  But
>> to date, clients have to learn these limitations via out-of-band
>> means.
>>
>> This alters NBD_OPT_INFO and NBD_OPT_GO to use a new reply type
>> NBD_REP_INFO (rather than overloading NBD_REP_SERVER), so that
>> we have a future-proof way of supplying as much additional
>> structured information about an export as we want.
>>
>> Design decision: a client that wants to learn block sizes MUST
>> use NBD_OPT_GO, rather than the old NBD_OPT_EXPORT_NAME, even
>> though we could have repurposed some of the reserved zeroes when
>> NBD_FLAG_C_NO_ZEROES is not in effect, because we don't want to
>> encourage any further abuse of NBD_OPT_EXPORT_NAME.
>>
>> Design decision: no new global NBD_FLAG or NBD_OPT are required;
>> there is nothing for the client to negotiate.  The server merely
>> provides as much information as it can, and the client then
>> interprets what information it understands.  The items are
>> structured so that a client can ignore details from the server
>> that the client does not know about, and that we can easily add
>> future items of information.
> 
> General note (in-depth review may follow later):
> 
> Currently, there are no default minimum or maximum block sizes, and
> therefore they are effectively limited to "1 byte" for the minimum block
> size, and "the size of the device" for the maximum block size.

We have existing servers that don't comply with this; I'm trying to
approach it from the angle of:

- if the server advertises, then that is the limit
- if the client and server communicate out of band, then use those limits
- if the server does not advertise, then the server MUST support a
minimum block size of 1 (although not all existing servers do, they can
be considered buggy), and the client SHOULD NOT send requests smaller
than 512 unless the export size proves a smaller block size is in use
(that is, if export size % 512 is non-zero, we know the server supports
a smaller block size in order to access those tail bytes), but the
client MAY still attempt to send smaller alignments and hope that the
server is not buggy

> 
> I do agree that it might be advantageous for the server to announce such
> minimum and maximum sizes, but I don't think that defining defaults that
> differ from what historically has been the effective default is the
> right way to go.
> 

But that's my argument - historically, there ARE servers which insist on
512 or even 4096-byte alignment, and which misbehave (close the
connection, return an error, or possibly even corrupt data) rather than
do read-modify-write on misaligned requests.  And qemu's server
certainly kills connections on attempts to exceed 32M in a single
transaction.  So while the theoretical server should be unlimited unless
advertised, the practical client should stick within sane limits rather
than attempting to exploit unlimited.

I'm open to suggestions on wording (such as using SHOULD rather than
MUST when a value is not advertised, so that existing implementations
are merely poorer quality of implementation rather than in violation of
the spec).

> Therefore, I would like this to say that unless you announce
> differently, the maximum block size is the size of the device, and the
> minimum block size is 1 byte.
> 
> Having a default for preferred block size sounds sane, although it might
> be better to switch it to 4096 (which is what most conversations seem to
> use today) rather than 512.

Except that I suggested 65k, not 4096, as the default preferred block
size.  Even though 4096 may be atomic, it is not as efficient; and in at
least qemu's case, the default qcow2 file sizing uses 32k clusters as
its preferred I/O size.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: