[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Safe File Update (atomic)



On Thu, Dec 30, 2010 at 4:20 PM, Henrique de Moraes Holschuh
<hmh@debian.org> wrote:
>> What if the target name is actually a symlink? To a different volume?
>
> Indeed. You have to check that first, of course :-(  This is about safe
> handling of such functions, symlinks always have to be derreferenced and
> their target checked.  After that, you operate on the target, if the symlink
> changes, your operations will not.

That's not really atomic.

>> What if you're not allowed to create a file in that dir.
>
> You fail the write.

That's a regression from the non-atomic case.

> Or the user has to request the unsafe handling
> (truncate + write).  Or you have to detect it will happen and switch modes
> if you're allowed to.
>
>> > If we could use some syscall to make [1] into a simple barrier request
>> > (guaranteed to degrade to fsync if barriers are not operating), it would
>> > be better performance-wise.  This is what one should request of libc and
>> > the kernels with a non-zero chance of getting it implemented (in fact,
>> > it might even already exist).
>>
>> My proposal was O_ATOMIC:
>> // begin transaction
>> open(fname, O_ATOMIC | O_TRUNC);
>> write; // 0+ times
>> close;
>>
>> Seems like the ideal API from the app's point of view.
>
> POSIX filesystems do not support it, so you'd need glibc to do everything

Not yet, but I assume it'll be added when there's enough demand.

> your application would have to get that atomicity.  I.e. it should go in a
> separate lib, anyway, and you will have to code for it in the app :(

Why would it have to go in a separate lib?

> It is not transparent.  It cannot be.  What about mmap()?  What about
> read+write patterns?

They either happen before or after this atomic transaction. Comparable
to the rename workaround.

> At most you could have an "open+write+close" function that encapsulate most
> of the crap, with a few options to tell it what to do if it finds a symlink
> or mismatched owner, what to do if it cannot do it in an atomic way, etc.
>
> I suppose one could actually ask for a non-posix interface to do all those
> three operations in one syscall, but I don't think the kernel people will

There's no need for a single syscall.

> want to implement it.  It would make sense only if object stores become
> commonplace (where this thing is likely an object store primitive, anyway).

Nah. Tons of files are written in one go. All could use this atomic flag.

>> >> I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
>> >> claim those exceptions aren't really a problem.
>> >
>> > Indeed they are not.  Code has been dealing with them for years.  You
>>
>> Code has been wrong for years to, based on the reason reports about
>> file corruption with ext4.
>
> Code written to *deal with files safely* by people who wanted to get it
> right and actually checked what needs to be done, has been right for years.
> And has piss-poor performance.

Isn't fixing / improving that a good thing?

> Code written by random joe which has no clue about the braindamages of POSIX
> and Unix, well... this thread shows how much crap is really needed.

So you agree that this should be improved?

> One can, obviously, have most filesystems be super-safe, and create a new
> fadvise or something to say "this is crap, be unsafe if you can".
> Performance will be poor, everything will be safe, and the extra fsyncs()
> will not hurt much because the fs would do it anyway.

I actually think this can be done with better performance then the
rename workaround.

>> > name the temp file properly, and teach your program to clean old ones up
>> > *safely* (see vim swap file handling for an example) when it starts.
>>
>> What about restoring meta-data? File-owner?
>
> Hmm, yes, more steps if you want to do something like that, as you must do
> it with the target open in exclusive mode.  close target only after the
> rename went ok.
>
> But if the file owner is not yourself, you really should change it, not to
> mention you might not want to complete the operation in the first place.

Why? Of course write access to the file is required.

>> I'll ask glibc.
>
> This really should be in a separate lib.  You want it to be usable outside
> of glibc systems, and you CAN implement it (slow that it will be) on
> anything POSIX.  You need only some help of the kernel to speed it up, and
> that has to be detected at compile time (support) and runtime (availability
> of the feature) anyway.

Olaf


Reply to: