[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Setting the physical block size

Alex Bligh <alex@...872...> writes:

> Goswin,
> --On 2 March 2012 22:54:14 +0100 Goswin von Brederlow
> <goswin-v-b@...186...> wrote:
>> Alex Bligh <alex@...872...> writes:
>>> --On 1 March 2012 17:17:41 +0100 Goswin von Brederlow
>>> <goswin-v-b@...186...> wrote:
>>>> There is an ioctl NBD_SET_BLKSIZE, which sets the lo->blksize. The only
>>> ...
>>>> Or did you mean that there should be a new ioctl NBD_SET_PHYS_BLKSIZE?
>>> No, I meant was there a non-nbd-specific block device ioctl to do this.
>>> Getting changes into nbd-client userspace code is (quite correctly)
>>> easier than getting them into the kernel.
>> Ahh, no idea. I wouldn't expect those values to be settable from
>> userspace for normal devices. They are something the driver sets to
>> reflet the hardware capabilities. But can't hurt to ask around.
> One problem with block sizes is that there are a zillion different
> block sizes, with different and overlapping naming conventions.
> So, there is the BLKBSZSET ioctl, which is commented to 'set the
> logical block size'. What it actually does is call set_blocksize().
> Does that do what you need?

I will have to check what that does.

> What your patch does is calls blk_queue_physical_block_size. But that's
> the 'physical' block size from the perspective of the queue, which is
> one layer up. IE it's setting q->limits.physical_block_size, as
> opposed to q->limits.logical_block_size.
> I can't help thinking that must always be at least
> a multiple of the 'real' block size one layer down.
> set_blocksize() seems to control the block size that an fs would use
> and in turn it's looking at bdev_logical_block_size(bdev)

The physical block size is the smallest size the device can handle
atomically. The smalles block it can handle without needing to do a
read-modify-write cycle. Applications should try to write in multiples
of that (and they do) but are not required to.

The logical block size on the other hand is the smallest size the device
can handle at all. Applications must write in multiples of that.

One thing that makes no sense to me though is that having a physical
block size of 4096 and writing 128k data at offset 0 with O_DIRECT
causes the kernel to write e.g. 122k followed by 6k. The request is
split into multiples of the logical block size instead of multiples of
the physical block size. That is clearly not optimal and I think a bug
in the kernel.

>> On another note setting the physical block size does not prevent the
>> kernel to send requests aligned and sized to multiples of the logical
>> block size. Setting the logical block size to > 512 means that O_DIRECT
>> access needs to also use blocks >512 bytes. So for example "dd
>> if=/dev/zero of=/dev/nbd0 oflag=direct" fails but works if you add
>> "bs=4k".
>> Can anyone think of other reasons why setting the logical block size
>> might be bad?
> More that I'm not sure we are necessarily setting the right thing.
> If we are trying to set a block size specified by the /client/, I
> think we should be using the existing ioctl, and set_blocksize(),
> rather than getting down lower. After all, other block sizes will
> work.
> If we are trying to set a block size specified by the /server/ through
> negotiation (i.e. 'this is the smallest blocksize that will actually
> work') I think that corresponds to the bdev_logical_block_size
> (or possibly the physical equivalent).
> I think you are doing the former, in which case I think we can just
> use the ioctl. If you want reason why:
> 1. No kernel changes, and you will thus get your patch in far far
>   quicker.
> 2. Because otherwise it's not possible to change it back to a lower
>   one with the ioctl, which I'd expect to work
> For full disclosure, the plethora of different blocksize settings confuse
> the hell out of me, so I might be wrong.

I think we need both or even all three. The logical/physical block size
should be negotiated with the server to match the underlying hardware or
simple what the server supports. The client can then set a blocksize
that is a multiple of what the server said.


Reply to: