Re: Safe File Update (atomic)
On Thu, Dec 30, 2010 at 4:20 PM, Henrique de Moraes Holschuh
>> What if the target name is actually a symlink? To a different volume?
> Indeed. You have to check that first, of course :-( This is about safe
> handling of such functions, symlinks always have to be derreferenced and
> their target checked. After that, you operate on the target, if the symlink
> changes, your operations will not.
That's not really atomic.
>> What if you're not allowed to create a file in that dir.
> You fail the write.
That's a regression from the non-atomic case.
> Or the user has to request the unsafe handling
> (truncate + write). Or you have to detect it will happen and switch modes
> if you're allowed to.
>> > If we could use some syscall to make  into a simple barrier request
>> > (guaranteed to degrade to fsync if barriers are not operating), it would
>> > be better performance-wise. This is what one should request of libc and
>> > the kernels with a non-zero chance of getting it implemented (in fact,
>> > it might even already exist).
>> My proposal was O_ATOMIC:
>> // begin transaction
>> open(fname, O_ATOMIC | O_TRUNC);
>> write; // 0+ times
>> Seems like the ideal API from the app's point of view.
> POSIX filesystems do not support it, so you'd need glibc to do everything
Not yet, but I assume it'll be added when there's enough demand.
> your application would have to get that atomicity. I.e. it should go in a
> separate lib, anyway, and you will have to code for it in the app :(
Why would it have to go in a separate lib?
> It is not transparent. It cannot be. What about mmap()? What about
> read+write patterns?
They either happen before or after this atomic transaction. Comparable
to the rename workaround.
> At most you could have an "open+write+close" function that encapsulate most
> of the crap, with a few options to tell it what to do if it finds a symlink
> or mismatched owner, what to do if it cannot do it in an atomic way, etc.
> I suppose one could actually ask for a non-posix interface to do all those
> three operations in one syscall, but I don't think the kernel people will
There's no need for a single syscall.
> want to implement it. It would make sense only if object stores become
> commonplace (where this thing is likely an object store primitive, anyway).
Nah. Tons of files are written in one go. All could use this atomic flag.
>> >> I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
>> >> claim those exceptions aren't really a problem.
>> > Indeed they are not. Code has been dealing with them for years. You
>> Code has been wrong for years to, based on the reason reports about
>> file corruption with ext4.
> Code written to *deal with files safely* by people who wanted to get it
> right and actually checked what needs to be done, has been right for years.
> And has piss-poor performance.
Isn't fixing / improving that a good thing?
> Code written by random joe which has no clue about the braindamages of POSIX
> and Unix, well... this thread shows how much crap is really needed.
So you agree that this should be improved?
> One can, obviously, have most filesystems be super-safe, and create a new
> fadvise or something to say "this is crap, be unsafe if you can".
> Performance will be poor, everything will be safe, and the extra fsyncs()
> will not hurt much because the fs would do it anyway.
I actually think this can be done with better performance then the
>> > name the temp file properly, and teach your program to clean old ones up
>> > *safely* (see vim swap file handling for an example) when it starts.
>> What about restoring meta-data? File-owner?
> Hmm, yes, more steps if you want to do something like that, as you must do
> it with the target open in exclusive mode. close target only after the
> rename went ok.
> But if the file owner is not yourself, you really should change it, not to
> mention you might not want to complete the operation in the first place.
Why? Of course write access to the file is required.
>> I'll ask glibc.
> This really should be in a separate lib. You want it to be usable outside
> of glibc systems, and you CAN implement it (slow that it will be) on
> anything POSIX. You need only some help of the kernel to speed it up, and
> that has to be detected at compile time (support) and runtime (availability
> of the feature) anyway.