Re: Safe File Update (atomic)
- To: Henrique de Moraes Holschuh <firstname.lastname@example.org>
- Cc: Olaf van der Spek <email@example.com>, debian devel <firstname.lastname@example.org>
- Subject: Re: Safe File Update (atomic)
- From: Ted Ts'o <email@example.com>
- Date: Sun, 2 Jan 2011 02:09:22 -0500
- Message-id: <[🔎] 20110102070922.GA6271@thunk.org>
- In-reply-to: <20101231115150.GB31280@khazad-dum.debian.net>
- References: <AANLkTimz6ui+L76H=F1Frtefb=-daGhoeACVnjsP73rU@mail.gmail.com> <20101230114655.GA19470@khazad-dum.debian.net> <20101231021723.GA9896@khazad-dum.debian.net> <AANLkTinnYXtF2CzhkFRMKw_gpP39h5uqU2j8oz1cSLYu@mail.gmail.com> <20101231115150.GB31280@khazad-dum.debian.net>
On Fri, Dec 31, 2010 at 09:51:50AM -0200, Henrique de Moraes Holschuh wrote:
> On Fri, 31 Dec 2010, Olaf van der Spek wrote:
> > Ah, hehe. BTW, care to respond to the mail I send to you?
> There is nothing more I can add to this thread. You want O_ATOMIC. It
> cannot be implemented for all use cases of the POSIX API, so it will not
> be implemented by the kernel. That's all there is to it, AFAIK.
> You could ask for a new (non-POSIX?) API that does not ask of a
> POSIX-like filesystem something it cannot provide (i.e. don't ask for
> something that requires inode->path reverse mappings). You could ask
> for syscalls to copy inodes, etc. You could ask for whatever is needed
> to do a (open+write+close) that is atomic if the target already exists.
> Maybe one of those has a better chance than O_ATOMIC.
The O_ATOMIC open flag is highly problematic, and it's not fully
specified. What if the system is under a huge amount of memory
pressure, and the badly behaved application program does:
fd = open("file", O_ATOMIC | O_TRUNC);
write(fd, buf, 2*1024*1024*1024); // write 2 gigs, heh, heh heh
<sleep for one day>
write(fd, buf2, 1024);
What happens if another program opens "file" for reading during the
one day sleep period? Does it get the the old contents of "file"?
The partially written, incomplete new version of "file"? What happens
if the file is currently mmap'ed, as Henrique has asked?
What if another program opens the file O_ATOMIC during the one day
sleep period, so the file is in the middle of getting updated by two
different processes using O_ATOMIC?
How exactly do the semantics for O_ATOMIC work?
And given at the momment ***zero*** file systems implement O_ATOMIC,
what should an application do as a fallback? And given that it is
highly unlikely this could ever be implemented for various file
systems including NFS, I'll observe this won't really reduce
application complexity, since you'll always need to have a fallback
for file systems and kernels that don't support O_ATOMIC.
And what are the use cases where this really makes sense? Will people
really code to this interface, knowing that it only works on Linux
(there are other operating systems, out there, like FreeBSD and
Solaris and AIX, you know, and some application programmers _do_ care
about portability), and the only benefits are (a) a marginal
performance boost for insane people who like to write vast number of
2-4 byte files without any need for atomic updates across a large
number of these small files, and (b) the ability to keep the the file
owner unchanged when someone other than the owner updates said file
(how important is this _really_; what is the use case where this
And of course, Olaf isn't actually offerring to implement this
hypothetical O_ATOMIC. Oh, no! He's just petulently demanding it,
even though he can't give us any concrete use cases where this would
actually be a huge win over a userspace "safe-write" library that
properly uses fsync() and rename().
If someone were to pay me a huge amount of money, and told me what was
the file size range where such a thing would be used, and what sort of
application would need it, and what kind of update frequency it should
be optimized for, and other semantic details about parallel O_ATOMIC
updates, what happens to users who are in the middle of reading the
file, what are the implications for quota, etc., it's certainly
something I can entertain. But at the moment, it's a vague
specification (not even a solution) looking for a problem.