Debian Squeeze ext4 corruption under VMware
Hello,
We have several Debian Squeeze servers running in VMs under VMware ESXi 
4.1.0 (build 502767 if that matters) with the latest VMware Tools bundle 
installed.  We're using the ext4 filesystem on each of these servers.  
We have had a few crashes of our VMware infrastructure, and each time, 
the Debian servers have all suffered filesystem corruption.  The problem 
seems to be that VMware attempts to "freeze" each VM when something goes 
wrong, and depending on the circumstances, it tries to move each VM to 
another VMware server.  This works fine for our Windows servers, but the 
Debian servers get all messed up.  Each VM remains in a "running" state, 
but the root filesystem is mounted read-only, and the console shows a 
ton of filesystem errors.  In most cases, the corruption has been 
recoverable by booting the VM to a Knoppix live CD and running fsck on 
the unmounted filesystem.  We've tried forcing fsck to run on boot, but 
for some reason it will not repair the filesystem, hence why we need to 
boot to a live CD.  In a few isolated cases, we have ended up with 
serious filesystem damage resulting in a huge number of files in 
/lost+found, and we've just rebuilt the VMs.
I'm just wondering if anyone else has seen this, or if anyone knows a 
way to make Debian deal with VMware's shenanigans more smoothly.  We do 
have a planned upgrade to VMware ESXi 5.0 in the next few months, and 
we're looking to get a new SAN solution (our SAN has been the source of 
at least two of these crashes), but I'd really like to get a handle on 
this issue sooner in case we have another problem.  I've Googled this 
problem, but I'm not finding much useful information.
Thanks!
    - Dave
--
Dave Parker
Systems Administrator
Utica College
Integrated Information Technology Services
(315) 792-3229
Registered Linux User #408177
Reply to: