Re: NBD prefetch read

To: Wouter Verhelst <w@uter.be>
Cc: Eric Blake <eblake@redhat.com>, nbd list <nbd@other.debian.org>, Alex Bligh <alex@alex.org.uk>, Paolo Bonzini <pbonzini@redhat.com>, "Denis V. Lunev" <den@openvz.org>
Subject: Re: NBD prefetch read
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date: Thu, 22 Mar 2018 22:30:55 +0300
Message-id: <[🔎] 83d75839-d3ef-a1ca-7bf8-737aee710d15@virtuozzo.com>
In-reply-to: <[🔎] 20180322182426.GA3315@grep.be>
References: <[🔎] 29413550-0694-f0b9-590e-5f84d742561e@virtuozzo.com> <[🔎] 2437ebc7-7740-00d3-732e-3f9e80d4206c@redhat.com> <[🔎] dc936c15-a3b3-f281-a7fc-7a3ebeeaea02@virtuozzo.com> <[🔎] 20180320165837.GA21033@grep.be> <[🔎] 5912ad0c-2470-8276-f095-e700cd840fef@virtuozzo.com> <[🔎] 20180321102053.GA6246@grep.be> <[🔎] 87e3a106-cd04-1ae3-aa5e-8a07ad7d3dd1@virtuozzo.com> <[🔎] 20180322182426.GA3315@grep.be>

22.03.2018 21:24, Wouter Verhelst wrote:

On Wed, Mar 21, 2018 at 02:05:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:

21.03.2018 13:20, Wouter Verhelst wrote:

On Tue, Mar 20, 2018 at 08:22:31PM +0300, Vladimir Sementsov-Ogievskiy wrote:

20.03.2018 19:58, Wouter Verhelst wrote:

On Tue, Mar 20, 2018 at 11:57:46AM +0300, Vladimir Sementsov-Ogievskiy wrote:

19.03.2018 17:39, Eric Blake wrote:

Can you demonstrate an actual sequence of commands sent over the wire,
for how it would be useful?

- we initialize two drives A and B in qemu and setup copy-on-read for them.
- client send a sequence of READAHEAD commands, and data is copied from A to
B on read from B, in corresponding sequence.

So, here B is a cache in terms of the PRE-FETCH command.

This sounds very similar to what xNBD does
(https://bitbucket.org/hirofuchi/xnbd/wiki/Home). Can you confirm?

If so, I suppose it makes sense to add the current behaviour of xNBD to
the spec, rather than inventing our own thing.

Hm, what do you mean on this page? "Scenario 2 (Simple proxy server,
distributed Copy-on-Write)" ?

Well, I don't necessarily mean the implementation details, so much as
the general concept :-)

You are talking about live migration of storage; that is what xNBD
implements and has in production. Does it not make sense to at least
look at what they're doing, so that we can possibly implement something
compatible?

It looks similar, but there is no control channel with READAHEAD. The idea
is no data is send through the control channel.

Eh, NBD has no control channel? I'm not sure what you're talking about
here.

We are doing restore, not migration. Start qemu over empty qcow2 image over
nbd-client with enabled copy-on-read, so, when client read something, it
goes into it's qcow2 (nbd-client is connected to backup NBD server). It's
ok.
But we also need a way to force data movement from backup to new qcow2
image, even if guest doesn't read the data, so, we want to simulate reads on
that qcow2, just to initiate copy-on-read. Moreover, we need a special
sequence of these simulated reads, and only a third tool knows this
sequence. So, we want to export qcow2 image over NBD - it's a "control
channel", only READAHEAD commands will be sent through this channel by third
tool. As I said, it's a managed copy-on-read process, managed by third tool.

Okay, so do I understand you correctly if you're saying it's something
like this:

client               server               third party
    | -----NBD CS------- |                      |
    |                    | ------ NBD TS ------ |
    | ---------------- NBD TC ----------------- |

The client reads "stuff" over the NBD CS channel (where it is a client,
and the server is, well, a server). It copies blocks to its local cache
when necessary.

The third party connects to both the client and the server as a client
(i.e., the client in the above acts as a client as well as a server at
the same time), on the NBD TS and NBD TC channels. It uses BLOCK_STATUS
commands (?) to figure out what the status of the restore is. When it
figures out that, after comparing the output of a BLOCK_STATUS command
on the NBD TS channel with the NBD TC channel, it would send the
proposed new command over the NBD TC channel, causing the client to then
read stuff from the server.

Is that understanding correct?

The same, but our scheme is even simpler, it doesn't use BLOCK_STATUSfor now. It may be also shown like this:


+---------- VM --------------+
|                            |
| guest                      |
|  |                         |
| local disk -- (NBD server)<----READAHEAD---+(third party NBD client)
|  |                         |
| (NBD client)<-----------------(data)-------+(NBD server) -- [VM backup]
|                            |
+----------------------------+

- when guest read block and there is no this block in local disk it isread from NBD client, and saved in local disk

- when guest write something, it is written into local disk

- on READAHEAD, if there is no this block in local disk, it is read fromNBD client and saved in local disk

- on READAHEAD, if there is corresponding block in local disk - do nothing


If so, then I think the semantics of that proposed new command are,
still, very similar to the NBD_CMD_BGCOPY that xNBD implemented, and we
should first look at that before inventing something new; that's not to
say that BGCOPY is exactly what we need, but it might be.


hmm, from xnbd Changelog:
Protocol changes
~~~~~~~~~~~~~~~~
 * NBD_CMD_BGCOPY (value 3) has been turned into NBD_CMD_CACHE (value 5)
   to get back in sync with the original NBD server and NBD in kernel

Is there any specification for NBD_CMD_CACHE? Or only the code? On firstsight it looks very similar with our needs.


If not, then please enlighten me, because I'm afraid that in that case
I'm lost :-)



--
Best regards,
Vladimir

Reply to:

Follow-Ups:
- Re: NBD prefetch read
  - From: Wouter Verhelst <w@uter.be>

References:
- NBD prefetch read
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: NBD prefetch read
  - From: Eric Blake <eblake@redhat.com>
- Re: NBD prefetch read
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: NBD prefetch read
  - From: Wouter Verhelst <w@uter.be>
- Re: NBD prefetch read
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: NBD prefetch read
  - From: Wouter Verhelst <w@uter.be>
- Re: NBD prefetch read
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: NBD prefetch read
  - From: Wouter Verhelst <w@uter.be>

Prev by Date: Re: NBD prefetch read
Next by Date: Re: NBD prefetch read
Previous by thread: Re: NBD prefetch read
Next by thread: Re: NBD prefetch read
Index(es):
- Date
- Thread