[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Disk corruption and performance issue.



On Sat, 20 Jan 2024, David Christensen wrote:

On 1/20/24 08:25, Tim Woodall wrote:
Some time ago I wrote about a data corruption issue. I've still not
managed to track it down ...

Please post a console session that demonstrates, or at least documents, the data corruption.


Console session is difficult - this is a script that takes around 6
hours to run - but a typical example of corruption is something like
this:

Preparing to unpack .../03-libperl5.34_5.34.0-5_arm64.deb ...
Unpacking libperl5.34:arm64 (5.34.0-5) ...
dpkg-deb (subprocess): decompressing archive '/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb' (size=4015516) member 'data.tar': lzma error: compressed data is corrupt
dpkg-deb: error: <decompress> subprocess returned error exit status 2
dpkg: error processing archive /tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb (--unpack):
 cannot copy extracted data for './usr/lib/aarch64-linux-gnu/libperl.so.5.34.0' to '/usr/lib/aarch64-linux-gnu/libperl.so.5.34.0.dpkg-new': unexpected end of file or stream

The checksum will have been verified by apt during the download but when
it comes to read the downloaded deb to unpack and install it doesn't get
the same data. The corruption can happen at both the writing (the file
on disk is corrupted) and the reading (the file on disk has the correct
checksum)


Please post console sessions that document the make and model of your disks, their partition tables, your md RAID configurations, and your LVM configurations.


Can you please give a clue as to what you're looking for? This is a
machine exposing dozens of LVM volumes via iscsi targets that are then
exported into VMs that then may be used as PVs in the VM.

The disk that I'm using when I saw the above error is a straight LVM ->
iscsi -> ext3 mounted like this:

/dev/xvdb on /mnt/mirror/ftp/mirror type ext3 (rw,noatime)

That is this iscsi target:
[fd01:8b0:bfcd:100:230:18ff:fe08:5ad6]:3260,1 iqn.xen17:aptmirror-archive

configured like this:
root@xen17:~# cat /etc/tgt/conf.d/aptmirror17.conf
<target iqn.xen17:aptmirror17>
  backing-store /dev/vg-xen17/aptmirror17
</target>
<target iqn.xen17:aptmirror-archive>
  backing-store /dev/vg-xen17/aptmirror-archive
</target>

and configured in the vm config like this:
disk=[ 'script=block-iscsi,vdev=xvda,target=portal=xen17:3260,iqn=iqn.xen17:aptmirror17,w',
       'script=block-iscsi,vdev=xvdb,target=portal=xen17:3260,iqn=iqn.xen17:aptmirror-archive,w',
]



Putting a sector size 512/512 disk and a sector size 512/4096 disk into the same mirror is unconventional. I suppose there are kernel developers who could definitively explain the consequences, but I am not one of them. The KISS solution is to use matching disks in RAID.


The problem with matching disks in the raid, which has bitten me before,
is that they can both be subject of a recall. I make a deliberate effort
to avoid matching disks for exactly that reason.

I'm happy to accept that this is "unconventional" - however, I didn't
even know it had happened. It was Andy's thread that gave me the clue to
look. I'm surprised that mdadm didn't say something - and I thought
LVM/mdadm all did everything at the 4k level anyway so I don't really
see why it should matter.

All the
partitions start on a 4k boundary but the big partition is not an exact
multiple of 4k.

I align my partitions to 1 MiB boundaries and suggest that you do the same.

They are aligned at 1M boundaries but while I could see that sub-4k
alignment could be triggering some (expected) problem, I can't really
see why 4k or 1M alignment would be different:

Device      Start        End    Sectors   Size Type
/dev/sda1    2048       4095       2048     1M BIOS boot
/dev/sda2    4096     264191     260096   127M EFI System
/dev/sda3  264192 1953525134 1953260943 931.4G Linux filesystem



... the "heavy load" filesystem that triggered the issue ...

Please post a console session that demonstrates how data corruption is related to I/O throughput.


I don't know how to do that except that I run a script every Sunday that
rebuilds my entire set of packages that I have locally in a sandbox. For
each package that builds a clean sandbox, installs all of the
build-depends and then builds it. It also generates some multi-hundred
MB compressed tar archives of "clean" systems that I use to bootstrap
installing new VMs. I have had the following commands report:

build-tarfiles.sh:  tar -C ${BUILDCHROOT} --one-file-system -Jcf ${PDIR}/${tgt} .
build-tarfiles.sh:  tar tvf ${PDIR}/${tgt} >/dev/null
build-tarfiles.sh:  tar tvf ${PDIR}/${tgt} >/dev/null

Where the first tar tvf reports that the archive is corrupted while the
second works (and the archive is uncorrupted)

There are a LOT of
partitions and filesystems in a complicated layered LVM setup ...

Complexity is the enemy of data integrity and system reliability. I suggest simplifying where it makes sense; but do not over-simplify.

I don't see any opportunity to simplify. It is complicated but
conceptually easy.

For example xen17 has >30 LVs, each exported via iscsi, they
are then mounted inside various VMs (currently 14 running) and then the
virtual disk inside that VM may, or may not be a LVM PV itself.

"Just supply everything" is going to be a multi-hundred-thousand line
email though.

lvm.conf from 14 VMs is going to be 30k lines on its own. I think that
the lvm.conf in the VMs is "unchanged" but without work I don't know
that they're unchanged from a default install. The one on xen17
definitely is changed:
        filter = [ "r|/dev/vg-xen17/.*|", "r|/dev/disk/by-path/ip-.*|",
"r|/dev/disk/by-id/usb-.*|",
"r|/dev/disk/by-id/usb-Kingston_DataTraveler_3.0_6CF049E16B59B03169C6D9ED-0:0|" ]

because I don't want the kernel looking into the various images that are
intended to be used in a VM. Whether I've made other changes I don't
recall without spending time going through logs or installing a mirror
system and diffing the files.

(And yes, I know that that last exclusion is redundant but I want that
one documented explicitly in case I need/want to remove the general
one)


Booted on the problem machine but physical disk still on the OK machine:
real    0m35.731s
user    0m5.291s
sys     0m4.677s

Booted on the good machine but physical disk still on the problem
machine:
real    0m57.721s
user    0m5.446s
sys     0m4.783s

Please provide host names.
The fast one above is running apt-get remove --purge
linux-image-5<something> on debootstrap19 - which is a VM running on xen17
but with physical backing disks exported from xen19

The slow one above is running the same command on debootstrap17 - which
is a VM running on xen19 but with physical backing disks exported from
xen17

Note that these systems are optimized for power consumption, not speed,
so "slow" is relative. I don't expect anything to be fast!

When I did the kernel upgrade debootstrap17 was running on xen17 and
debootstrap19 was running on xen19 - and the slowness stayed with
debootstrap17 (but I didn't take timings)

And note that the slowness is not linked to debootstrap17 - all VMs with
a backing disk on xen17 are slow relative to VMs with a backing disk on
xen19 which indicates that the problem is xen17 or the disks on xen17.

I have been assuming the problem was with xen17 itself - and I've made
sure everything important is on xen19 - but I'm starting to suspect
that there's a disk problem on xen17 (which exhibits only as corrupted
reads and writes but no errors in any logs or SMART)

Next Sunday the big rebuild job will kick off on aptmirror19 but that has
been moved to being hosted on xen17 (with backing disks still on xen19)

I've never had a data-corruption failure since I moved the entire job
(vm and backing disk) to xen19. It happened every Sunday when it was on
xen17.

Tim.


Reply to: