Re: [Nbd] [PATCH/RFC] Add support for NBD_CMD_WRITE_ZEROES
- To: Alex Bligh <alex@...872...>
- Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>
- Subject: Re: [Nbd] [PATCH/RFC] Add support for NBD_CMD_WRITE_ZEROES
- From: Wouter Verhelst <w@...112...>
- Date: Sat, 16 Apr 2016 20:01:21 +0200
- Message-id: <20160416180121.GD3033@...3...>
- In-reply-to: <736ADB91-9F8F-4EF1-B49F-AF2FE688DEAD@...872...>
- References: <1460721838-44985-1-git-send-email-alex@...872...> <5710F65A.1080801@...696...> <AD69F7C6-1B65-48DA-942E-E858D3E4348D@...872...> <20160416173035.GC3033@...3...> <736ADB91-9F8F-4EF1-B49F-AF2FE688DEAD@...872...>
On Sat, Apr 16, 2016 at 06:39:16PM +0100, Alex Bligh wrote:
> Wouter,
>
> On 16 Apr 2016, at 18:30, Wouter Verhelst <w@...112...> wrote:
>
> > On Fri, Apr 15, 2016 at 04:50:45PM +0100, Alex Bligh wrote:
> >>
> >> On 15 Apr 2016, at 15:10, Eric Blake <eblake@...696...> wrote:
> >>
> >>> You completely ignore NBD_CMD_FLAG_NO_HOLE; you could obey that flag
> >>> (when set, do the calloc() here; when cleared, call into the trim code
> >>> if you can guarantee that you'll read zero after trims).
> >>
> >> Sadly you cannot guarantee that you will read zeroes after
> >> exptrim is called. Exceptions include:
> >>
> >> 1. F_COPYONWRITE being set (trim is ignored)
> >
> > I've been wanting to modify copyonwrite so that when you TRIM, it clears
> > out the diff file, but not the original file.
> >
> > That would still mean it doesn't zero out stuff, obviously.
>
> Yeah I looked at that. I'm not that familiar with the CoW (or
> treefiles) stuff so didn't fancy coding it.
>
> >> 2. The initial and final blocks if F_TREEFILES is set if either of
> >> the start and end of the read is not aligned to TREEPAGESIZE
> >
> > We could detect if that's the case, and split the request up into three:
> > head and tail with writes, middle part with treefile handling.
> >
> > Due to the nature of how the treefiles stuff works, it's 100% guaranteed
> > that that would work.
>
> Indeed. And I suspect CoW requires similar.
Well, no, it doesn't. And this is why I'm not in favour of doing the
"first write zeroes, then trim" approach.
CoW:
block | backing file | diff file
------|--------------|---------------
1 | X |
2 | X | X
3 | X |
4 | X |
5 | X | X
6 | X |
7 | X | X
where X denotes a block that has been allocated. The backing file is
fully allocated, the diff file is not. When I get around to it, the plan
is for TRIM on a CoW file to result in it dropping the allocation for
the diff file -- but not for the backing file, because you don't want to
touch that file.
In the above example, if you were to TRIM on blocks 5-7, then from the
client's PoV the operation would change the contents of blocks 5 and 7
to whatever the backing file has for those blocks, but would not touch
block 6.
Now, let's say you have a WRITE_ZEROES request for blocks five, six, and
seven, and you implement it by first writing zeroes and then trimming
them. This would first set them all to zero (allocating block six) and
then drop all three blocks in the diff file, thereby setting them to
nonzero again.
A proper implementation would know to treat CoW specially so it doesn't
do trim but only writes zeroes. Just dumbly doing write-then-trim
doesn't do that, however, and makes it hard to implement TRIM for CoW in
a sane way.
Yes, you could special-case CoW in expwrite_zeroes in the above way, but
I'd really like to start abstracting such things away rather than adding
more special-casing. So for now, just make it write zeroes and not do
any hole-punching? We can clean it up and do it properly later.
> And it would also be better to pass through the fact it is a 'zero'
> write to the underlying functions (I was going to use buf=null to mean
> 'write zeroes' which would quickly flag up any error).
Yes, that would work too.
[...]
> >> 4. Probably something on Windows (!)
> >
> > Gotta admit, I don't know what the semantics of the Windows punch_hole
> > equivalent are.
>
> From a very quick look online, some filing systems work
> and some don't.
Same as for Linux, then.
[...]
> >>> But if nothing
> >>> else, you have a bug for not permitting the NBD_CMD_FLAG_NO_HOLE (even
> >>> if you intend to ignore it by ALWAYS operating in no-hole mode, as was
> >>> done here).
> >>
> >> Or I could do the physical write of zeroes and then call trim afterwards
> >> if NBD_CMD_FLAG_NO_HOLE is *not* set and trim is supported.
> >>
> >> As fallocate() and friends are cheap, I think I will do that.
> >
> > That gets awkward if you remember all the possible backing store modes.
>
> Yeah, it will be slow on some backing store modes.
Not what I meant (see above).
[...]
> > I really should get started on that abstraction layer I was talking
> > about earlier.
> >
> > I've been thinking we should also have the TRIM and WRITE_ZEROES
> > handling be combined into one set of functions. You'd have a "expclear"
> > function or some such which has two booleans:
> >
> > - must_zero: when WRITE_ZEROES was called, set it to true. Operation
> > must then set all affected bytes to zero.
> > - may_hole: when either TRIM was called or WRITE_ZEROES doesn't have the
> > NO_HOLE flag set. Operation may deallocate the range if possible.
>
> That's interesting. I was thinking about combining it with 'write'
> with buf=0, and possibly keeping a page (mmap with MAP_ANON if
> mmap is available) of zeroes around in virtual memory, marked as
> read only, they could use.
That could work too, I suppose.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
Reply to: