[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The danger of dishonest disk drives (WAS:Re: Need to remove a ghost file, but can't because it doesn't exist)



On Friday 24 November 2006 12:21, Tim Post wrote:
> On Thu, 2006-11-23 at 17:31 -0500, Douglas Tutty wrote:
> > The question is how does the file system know that a write has made it
> > to disk.  E.g. if the file system is atomic transaction oriented, how
> > can the file system know that a commit has been committed if the drive
> > lies?
>
> Its hard to know for sure especially if the server is under abnormal
> load and the inodes are 100% in use, and all that's left is dirty
> paging. This seems to be where the problem happens frequently.
>
> I've been following this thread and thought I'd do a bit of
> experimenting to see which of the two best recover themselves.
>
> Here's my worst case scenario (and test bed)
>
> Debian Sarge under Xen, 1 40 GB lvm backed partition (jfs)  (#1)
> Debian Sarge under Xen, 1 40 GB lvm backed partition (ext3) (#2)
>
> Both LVM backed VBD's live on separate 400 GB SATA drives. Standard O/B
> SATA controller (4 port, no raid).
>
> Both systems have a small 512 MB ext2 root FS as a control. The 40 GB
> partition was mounted in /datahell
>
> Both systems have 2 GB RAM, 2 CPU's (Test conducted on a Dual Opteron),
> test machine one has cpu0 core 0 cpu2 core 1, test machine 2 has cpu0
> core1 cpu2 core0.
>
> So now we have for all intensive purposes 2 machines with a single dual
> core opteron in them.
>
> Here was the test :
>
> Untar about 12 GB worth of files on both drives.. these files consist of
> some old backup CD's, shareware CD's .. just thousands and thousands of
> files.
>
> I then ran a shell script that caused 'updatedb' to fork a few hundred
> times in the background on each server, it kept forking
> until /proc/loadavg got to be about 70.0
>
> Once that happened, I paused both VM's, issued a sysrq to sync disks and
> destroyed them in memory. This simulated an out of control box where the
> admin was able to effect a shutdown where disks synced (not just push
> re-set).
>
> Booted them up again :
>
> Ext3 spent 30 minutes in a fsck, some data was lost
>
> jfs spent 5 minutes, no data was lost
>
> ext2 root FS didn't have any issues.. but nothing was being written to
> it during the experiment.
>
> Experiment #2
>
> Fresh 20 GB partitions just like before :
>
> Same experiment, only this time I didn't sync disks. I just destroyed
> the VMs in memory (same as pulling out the power plug), rebooted.
>
> ext3 fixed a couple of inodes and came back pretty quickly
> jfs drive wasn't able to be mounted.

I don't know if this is Off Topic, but sounds relevant to me. 

I am using the following version of JFS utils, if that makes any difference: 
jfsutils/testing uptodate 1.1.11-1

I have formatted a partition with JFS on my hard-disk. Everything works fine 
whenever this partition is mounted. 

The only real pain in the ass is this JFS partition refuses to mount!! 
Then I have to run something like jfs_fsck, which reports that the partition 
is CLEAN. 
Then I run jfs_fsck -a (for automatic repair) ..by instinct ;-) and bingo! the 
partition can be mounted with absolutely no grudges now! 

Now my question is, why does this partition simply doesn't mount? It gives me 
an error saying bad fs type etc. But then, I mount using the following 
command:
mount -t jfs /dev/hda1 /mnt/stuff

And why do I have to run jfs_fsck -a everytime I need to mount this partition?

Is it some problem with JFS itself?
BTW I am running Debian Testing with the 2.6.17-2-686 Kernel. 




>
> Again, ext2 root fs had no issues, but we weren't expecting any. ext2
> rootfs was used just as a control (and to boot). /var was moved to the
> second drive (where slocate's DB lives).
>
> End result is, its going to depend on how the file system manages to
> allocate inodes ahead of itself , and at what point in time your system
> runs out of clean pages to grab. JFS seems to do well *only* if your
> able to sync disks and it can write those inodes.. it leaves quite a bit
> of data in memory. However its much happier about flushing its inode
> cache and syncing even if all that is available is dirty paging.
>
> ext3 seems more likely to recover from its journal in the event you
> can't sync disks, but syncing it with maxxed/bloated inodes (reaching
> into dirty pages) seems to break it.
>
> Its really application specific I guess.. if you have the luxury of
> being able to anticipate what the world will do to your public services
> once you plug the Internet into a server the choice is a little
> easier .. but there is no magic bullet :)
>
> Ext3 seems more likely to come back to life after an unattended crash
> (where nobody was there to try and slow down the skid.)
>
> JFS seems like the winner if your system doesn't often get abused, or if
> you have the ability to monitor it closely and intercede should you see
> dirty paging (swap) and inodes running high. Note, because JFS seems to
> use much more memory to allocate its inodes, this may lead to your
> applications needing swap faster than they would with ext3.
>
> 6 of one , 1/2 dozen of the other really.. but hopefully my little
> experiment helps someone decide which one is best to use :) I had a few
> systems setup for an ocfs2 stress test and figured I'd take advantage of
> it for this.
>
> I was in no way measuring i/o performance .. just how well file systems
> came back to life after bad things happened.
>
> Best,
> -Tim
>
> > Doug.

-- 
Regards, 
Amit. 

Remember fellas, what we do in life echoes in eternity! 



Reply to: