[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The fsync issue

Guillem Jover <guillem@debian.org> writes:

> On Sat, 2010-11-27 at 01:41:19 -0600, Jonathan Nieder wrote:
>> Guillem Jover wrote:
>> > Unfortunately that patch does not seem much appealing, it's Linux only,
>> > not even in mainline, and it would need for dpkg to track on which file
>> > system each file is located and issue such ioctl once per file system.

What if you issue one ioctl per file? Won't the duplicates just return
provided there is nothing else writing fresh data to the FS?

>> > I'd rather not complicate the dpkg source code even more for something
>> > that seems to me to be a bug or missfeature in the file system. More so
>> > when there's a clear fix (nodelalloc) that solves both the performance
>> > and data safety issues in general.
>> I don't really understand this point of view: isn't the fsync storm
>> going to cause seeky I/O on just about all file systems?
> Well sure it might, but then some seem to be able to cope just fine, even
> ext4 with nodelalloc. Also seeks might stop being that relevant (in the
> mid/long term) once SSD becomes more widespread.
>> So the POSIX primitives are not rich enough to express what we want to
>> happen.  Delayed allocation is pretty much essential for the use case
>> ubifs targets, so it doesn't make much sense to me to pretend it
>> doesn't exist.
> As long as delayed allocation is a synonym for zero-length files, then
> I personally consider it a misfeature. This is data loss we are talking
> about, and while data coming from packages is easily recoverable
> although cumbersome, user data might not. We got fsck, journals and
> similar to recover from system crashes, and now we get zero-length
> files in the name of performance, it seems clear to me that's a
> regression.

What if you use data journaling? Shouldn't that replay the data after a
crash and thus not suffer from 0 byte files? Or does delalloc prevent
the data to be written to journal until the time it allocates a block
for it?

> Anyway my thinking process goes a bit like this: There's currently a
> handful of programs doing the complete write+fsync+rename dance, with
> the file systems which need it penalize heavily. If more programs start
> to get "fixed" to do the fsyncs then the situation overall will just
> worsen. And then at that point I think it's completely unreasonable
> to expect every userland program to add such complexity and unportable
> hack over hack to workaround the file system problems.

Usualy one does this on ONE file and everything is fine.

The problem only arises because dpkg is doing this on a million files
and if I understood the problem correctly in ext4 each one of them
causes a lengthy data + metadata + super sync again and again.

I think one long term solution to this might be to invent an async
fsync() call. A way to tell the FS that the file should be synced
soonest and report back when it is done. This should make the FS collect
multiple files into a single sync. One possible way to implement this
would be to mmap each file and msync() it with MS_ASYNC. But as that
doesn't cover the metadata part I'm not to sure it would completly solve
the bottleneck.

Anyone with ext4 feel up to implementing this in dpkg and measuring it?

> For non-technical users, data safety should be way more important than
> performance, having to recover a hosed system might mean they'd just
> reinstall it. For technical users I see the options as follows: help
> fix the file system to perform reasonably with fsync() or not lose
> data w/o fsync(), use another file system, use other better mount
> options, use dpkg --force-unsafe-io and cope with data loss.
> But then I think I've said most of this elsewhere already.
>> I'll look into a (Linux-specific, obviously) patch to add a function
>> that takes an array of paths and performs the relevant syncs of
>> filesystems where that ioctl exists tomorrow.  I would rather see a
>> system call that just takes an array of paths, since I imagine
>> filesystems like btrfs could do something good with that, but since
>> there are no VFS primitives for it I can see why that wasn't proposed.
> Tracking fds is going to be easier, at that point dpkg already has
> the stat information, so it could queue an fd per unique st_dev for
> example.

That sounds like a good plan. How hard would it be to implement this
based on FDs instead of path? Would the ioctl patch need changes to work
on an FD instead of path? (sorry, haven't read the patch)

> regards,
> guillem


Reply to: