[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Strange (cosmic ray?) corruption.



On Mon, 26 May 2025, Gareth Evans wrote:

Have you over/under-clocked or otherwise adjusted your CPU settings?

No, nothing changed
This "solved" issue which seems to be similar is put down to processor core instability under certain conditions:

https://bbs.archlinux.org/viewtopic.php?id=290093

It's certainly a possibility, but it seems bizarre that it's only affecting one VM out of 13.

It did start when the weather got warmer.

As the last comment mentions, I was also wondering about the possibility of thermal issues.

Might a smartmon or memtest test be worthwhile?

You don't seem to be dumping from read-only snapshots afaics, but that these particular files might be changing between checksum and compression (or whichever comes first) seems an unlikely spanner in the works.

The snapshot is not readonly (deliberately so I can run fsck before dumping) but it's not mounted while dumping, only while verifying. Changing files would cause a verification error, not a decompression error though.

Back in the olden days before rw snapshots were possible (or properly reliable, IIRC until about 2010), I did occasionally get verification errors that I put down to the snapshot being inconsistent in some way despite a sync. I'm not exactly sure what journal replay on a ro snapshot implies. Since I started using rw snapshots and fsck before dumping, verification errors are so rare I don't recall the last time I saw one other than this issue. (It does sometimes happen but mostly it's a bug in dump or an ext4 feature I've not tested properly)


HTH
Gareth




Reply to: