On 09/23/2016 01:32 PM, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > There is a following problem. When we need to write_zeroes or trim the > whole disk, we have to do it iteratively, because of 32-bit restriction > on request length. > For example, current implementation of mirror (see mirror_dirty_init()) > do this by chunks of 2147418112 bytes (with default granularity of > 65536). So, to zero 16tb disk we will make 8192 requests instead of one. > > Incremental zeroing of 1tb qcow2 takes > 80 seconds for me (see below). > This means ~20 minutes for copying empty 16tb qcow2 disk which is > obviously a waste of time. > > We see the following solutions for nbd: > || > 1. Add command NBD_MAKE_EMPTY, with flag, saying what should be done: > trim or write_zeroes. Presumably spelled NBD_CMD_MAKE_EMPTY. > 2. Add flag NBD_CMD_FLAG_WHOLE for commands NBD_TRIM and > NBD_WRITE_ZEROES, which will say (with zeroed offset and lenght of the > request), that the whole disk should be discarded/zeroed. Both of these are possible. As it is, NBD_CMD_WRITE_ZEROES is not even formally part of the NBD spec yet, although NBD_CMD_TRIM is (I'm still sitting on my qemu proof-of-concept patches for WRITE_ZEROES, and need to resubmit them now that the qemu 2.8 development window is open). Either way, the server would have to advertise if the new command and/or new flags to existing commands are available for a whole-disk trim/zero, before a client could use it, and clients must be prepared to fall back to incremental approaches otherwise. My preference would be a new flag to the existing commands, with explicit documentation that 0 offset and 0 length must be used with that flag, when requesting a full-device wipe. > 3. Increase length field of the request to 64bit. No; that won't work. It would be a fundamental change to the NBD protocol, and require both new servers and new clients to talk a different wire protocol with different size length parameters. > > As soon as we have some way to empty disk in nbd, we can use > qcow2_make_empty, to trim the whole disk (and something similar should > be done for zeroing). > > What do you think about this all, and which way has a chance to get into > nbd proto? It's not necessarily obvious that the ability to bulk-trim or bulk-zero a device should be fundamentally faster than doing it incrementally in 2G chunks; but I concede that there may indeed be scenarios such as qemu's qcow2 file where that is true. So it does sound like a useful option and/or command to be proposed for addition to the NBD protocol, from that point of view. As with other extensions to NBD, the best way is to write up a proposal for how the documentation should change, submit that as patches to the nbd list, and accompany it with a proof-of-concept implementation (qemu's nbd server and nbd client work well), so that we can iron out the details of the documentation before making it a formal part of the spec. It's important to remember that such a proposal should still be optional (a server need not implement the new mode, and a client should be prepared to fall back to other means if the server does not support a whole-device action). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature