Re: Strange (cosmic ray?) corruption.
On Mon, 26 May 2025, Gareth Evans wrote:
Have you over/under-clocked or otherwise adjusted your CPU settings?
No, nothing changed
This "solved" issue which seems to be similar is put down to processor core instability under certain conditions:
https://bbs.archlinux.org/viewtopic.php?id=290093
It's certainly a possibility, but it seems bizarre that it's only
affecting one VM out of 13.
It did start when the weather got warmer.
As the last comment mentions, I was also wondering about the
possibility of thermal issues.
Might a smartmon or memtest test be worthwhile?
You don't seem to be dumping from read-only snapshots afaics, but that
these particular files might be changing between checksum and
compression (or whichever comes first) seems an unlikely spanner in
the works.
The snapshot is not readonly (deliberately so I can run fsck before
dumping) but it's not mounted while dumping, only while verifying.
Changing files would cause a verification error, not a decompression
error though.
Back in the olden days before rw snapshots were possible (or properly
reliable, IIRC until about 2010), I did occasionally get verification
errors that I put down to the snapshot being inconsistent in some way
despite a sync. I'm not exactly sure what journal replay on a ro
snapshot implies. Since I started using rw snapshots and fsck before
dumping, verification errors are so rare I don't recall the last time I
saw one other than this issue. (It does sometimes happen but mostly it's
a bug in dump or an ext4 feature I've not tested properly)
HTH
Gareth
Reply to: