[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: XFS and Power Failures [Was: Linux filesystems]



In <87fwvzuaec.fsf@alamut.ozu.edu.tr>, Volkan YAZICI wrote:
>On Tue, 27 Jul 2010, Volkan YAZICI <yazicivo@ttmail.com> writes:
>> On Tue, 27 Jul 2010, Stan Hoeppner <stan@hardwarefreak.com> writes:
>>> What write operations were you performing at the time you pulled the
>>> plug? Unless you were writing the superblock it'd be almost impossible
>>> to hose the filesystem to the point it couldn't mount.  What do you
>>> mean, precisely, by "couldn't *recover* the / fs"?
>> 
>> Vanilla XFS with noatime,notail like basic mount options. The test was
>> simple, I was just typing "SELECT 1" from a psql command line (this
>> query shouldn't even hit to disk, it just basically returns 1) and
>> unplugged machine. At boot, I dropped to fsck command line. At command
>> prompt, I manually fiddled around with fsck of xfs to recover the
>> unmounted / filesystem, but had no luck. (I also tried recommendations
>> and informative messages supplied by manpages and command
>> outputs/warnings.)
>
>Another scenario, same failure.
>It crashes for some driver specific reasons and
>I need to hard-reset the notebook. Now I
>lost all of my Opera bookmarks (~500 collected in years). Thanks XFS,
>but no, you're not power-failure. (BTW, I "kill -9"ed Opera many times,
>and it restored all of its settings properly. I don't think it is an
>Opera or WindowMaker related bug.)

XFS is the not only file system where power failure can result in a truncated 
file.  Even ext3 can have that issue, though it is less likely.

However, if the application follows a certain procedure when re-writing files, 
it will not lose data on any of these file systems.  I suggest that Opera 
should be fixed to use that procedure.  IIRC, this is a slight variation on 
the old "two-phase save" that some editors have used for decades, it simply 
requires a fsync on the temporary file.

BTW, a kill -9 is very different from a power failure or a hard reset.  In the 
first case, the application is allowed to do it's own cleanup; the kernel 
still cleans up after the process -- closing handles to kernel resources, like 
file descriptors; and queued tasks, like delayed allocation and flushing data 
to disk, can be run at a later time still.  In the later, horrible things 
happen (e.g. in some systems the HDs and BUS can run for just long enough to 
complete a DMA transfer AFTER the RAM has lost coherency) and no software gets 
to run long enough to even detect what is happening, much less put the 
hardware in a known good state.
-- 
Boyd Stephen Smith Jr.                   ,= ,-_-. =.
bss@iguanasuicide.net                   ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy         `-'(. .)`-'
http://iguanasuicide.net/                    \_/

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: