[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#705124: base: Filesystem corruption issue



Replies in line below.

On Mon, Apr 15, 2013 at 4:54 AM, Ian Campbell <ijc@hellion.org.uk> wrote:
On Wed, 2013-04-10 at 08:17 -0400, Anthony Sheetz wrote:
> Steps to reproduce:
> Install Debian Testing from Netinstall CD, amd64.
> Choose LVM and Full Disk Encryption, with a separate /home
> Resize /home to be 80GB
> Install openswan, connect to remote network
> Install xen
> Set up a virtual machine with Debian Stable using logical volumes as the backing store.
>       fs: ext3
>       network: NAT
> transfer a large (multigigabyte) file from a remote server over the internet to the virtual machine
>
> Expected behavior: File transfers fine, md5sum agrees with remote system
> Observed behavior: md5sum never matches, done enough times, the ext3 fs becomes corrupted

Can I just confirm a few things please:

The VM disk backend is an LVM volume which is included in the full disk
encryption? I suppose it is using dm-crypt?
 
Correct on both accounts. 

The ext3 fs which becomes corrupted is the guest VM filesystem, not the
dom0 filesystem nor a filesystem which is is what the the large
multigigabyte file which is transferred over the network consists of?

Correct again.
 
On the face of it it sounds to me like the network corruption (md5sum
issue) and the eventual ext3 corruption must be separate issues. Or I
suppose it is possible that the file is received correctly but is
corrupted when written to the disk, but it's probably better to consider
them separately until we know one way or the other.

WRT the file transfer corruption: Is the file being transferred over the
openswan link?
Yes. Dom0 is set up with the openswan connection, DomU is set up to use NAT through Dom0 - file was transferred that way.

Did you ever happen to try a transfer over a
non-tunnelled connection?

Yes, tried file transfers from another machine on the local network - never had a problem with those. 

Were you able to successfully transfer the
file to the dom0 filesystem or to any other system (e.g. one not running
Xen) on this end of the openswan link?
 
Yes - tried that several times, and was able to do the transfer with no corruption, and md5sum matched. 

I'm not sure what error
detection/correction scp/rsync or if they have any additional
verification options which could be tried or perhaps it is possible to
run md5sum on the stream before it hits the disk (can one rsync/scp to
stdout? I doubt it).

Tried doing 'scp file.sql | md5sum' on DomU which resulted in a matching md5sum. We decided this eliminated the openswan link as the culprit.
 
If you can transfer to dom0 OK then it might be
interesting to try turning off the various offloads (GSO, SG etc) on the
vif link.

Any instructions on doing that?

WRT the filesystem corruption: How did the ext3 corruption manifest
itself?

Initially with errors on the console (and in kernel.log and other places) about writes beyond the end of the logical volume. After a time, the filesystem would be set to read-only, and refuse to mount in read/write mode. 

I wonder if the layering of crypto+lvm+xen-blkback is causing
the barriers which ext3 requires to function correctly to not occur in
the right places. Does something need to be manually configured to
enable barriers at some layer? (or perhaps I am thinking of DISCARD
support). If you were able to attempt to reproduce without the crypto
bit in dom0 for the VM disk that would be really useful. It might also
be interesting to try using the ext3 barrier mount option in the guest
to switch barriers either off or on (I can't remember what the default
was for Squeeze).

Google led me to try mounting the file system with barriers=0, and no luck.
 
I appreciate that you may have redeployed/downgraded the systems so some
of the above experiments might be quite hard to try out but if you could
setup a spare system or something it would be very much appreciated.

We planned for this, and once we have some ideas to try (with some detailed instructions for trying them) we'll be purchasing a spare hard drive to try them out. We'd like this problem solved, and we're willing to spend a little to do it. 

Ian.



Reply to: