[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SMART Uncorrectable_Error_Cnt rising - should I be worried?



On 2024-02-15 at 03:09, David Christensen wrote:

> On 2/14/24 18:54, The Wanderer wrote:
> 
>> TL;DR: It worked! I'm back up and running, with what appears to be
>> all my data safely recovered from the failing storage stack!
> 
> That is good to hear.  :-)
> 
>> On 2024-01-09 at 14:22, The Wanderer wrote:
>> 
>>> On 2024-01-09 at 14:01, Michael Kjörling wrote:
>>> 
>>>> On 9 Jan 2024 13:25 -0500, from wanderer@fastmail.fm (The
>>>> Wanderer):
>> 
>>>>> I've ordered a 22TB external drive
> 
> Make?  Model?  How it is interfaced to your computer?

It's a WD Elements 20TB drive (I'm not sure where I got the 22 from);
the back of the case has the part number WDBWLG0200HBK-X8 (or possibly
-XB, the font is kind of ambiguous). The connection, per the packaging
label, is USB-3.

>> In the time since this, I continued mostly-normal but
>> somewhat-curtailed use of the system, and saw few messages about
>> these matters that did not arise from attempts to back up the data
>> for later recovery purposes.
> 
> Migrating large amounts of data from one storage configuration to
> another storage configuration is non-trivial.  Anticipating problems
> and preparing for them ahead of time (e.g. backups) makes it even
> less trivial.  The last time I lost data was during a migration when
> I had barely enough hardware.  I made a conscious decision to always
> have a surplus of hardware.

The big change of plans in the middle of my month-plus process was the
decision to replace the entire 8-drive array with a 6-drive array, and
the reason for that was because the 8-drive array left me with no open
SATA ports to be able to connect spare drives in order to do drive
replacements without needing to rebuild the whole shaboozle.

I don't currently have a surplus of hardware (see the $2200 it already
cost me for the replacement drives I have), but I also haven't yet
initiated a warranty claim on the 870 EVO drives, and it seems possible
that that process might leave me with either replacement drives on that
front or just plain money (even if from selling the replacement drives
on e.g. eBay) with which to purchase spare-able hardware.

>>> (For awareness: this is all a source of considerable
>>> psychological stress to me, to an extent that is leaving me on
>>> the edge of physically ill, and I am managing to remain on the
>>> good side of that line only by minimizing my mental engagement
>>> with the issue as much as possible. I am currently able to read
>>> and respond to these mails without pressing that line, but that
>>> may change at any moment, and if so I will stop replying without
>>> notice until things change again.)
>> 
>> This need to stop reading wound up happening almost immediately
>> after I sent the message to which I am replying.
> 
> I remember reading your comment and then noticing you went silent.  I
> apologize if I pushed your button.

As far as I know you didn't. I don't think I even read any of the
replies after sending that message, and if I did, I don't remember any
of them having this type of impact; it was just the holistic stress of
the entire situation.

>> I now, however, have good news to report back: after more than a
>> month, at least one change of plans, nearly $2200 in replacement
>> hard drives,
> 
> Ouch.

Yeah. The cost factor is why I was originally planning to spread this
out over time, buying two drives a month until I had enough to replace
drives one at a time in the 8-drive array. I eventually decided that -
especially with the rsnapshot tiered backups turning out not to be
viable, because of the hardlinks thing - the risk factor of stretching
things out further wasn't going to be worth the benefit.

IIRC, the drives were actually $339 apiece, which would put the total
price for six in the $2030-$2040 range; sales tax and shipping costs
were what put it up to nearly $2200.

> If you have a processor, memory, PCIe slot, and HBA to match those
> SSD's, the performance of those SSD's should be very nice.

The CPU is a Ryxen 5 5600X. The RAM is G-Skill DDR4 2666MHz, in two 32GB
DIMMs. I don't know how to assess PCIe slots and HBA, but the
motherboard is an Asus ROG Crosshair VIII Dark Hero, which I think was
the top-of-the-line enthusiast motherboard (with the port set my
criteria called for) the year I built this machine.

I'm pretty sure my performance bottleneck for most things is the CPU (or
the GPU, where that comes into play, which here it doesn't);
storage-wise this seems so far to be at least as fast as what I had
before, but it's hard to tell if it's faster.

>> much nervous stress, several days of running data copies to and
>> from a 20+-terabyte mechanical hard drive over USB, and a complete
>> manual removal of my old 8-drive RAID-6 array and build of a new
>> 6-drive RAID-6 array (and of the LVM structure on top of it), I now
>> appear to have complete success.
>> 
>> I am now running on a restored copy of the data on the affected 
>> partitions, taken from a nearly-fully-shut-down system state, which
>> is sitting on a new RAID-6 array built on what I understand to be 
>> data-center-class SSDs (which should, therefore, be more suitable
>> to the 24/7-uptime read-mostly workload I expect of my storage).
>> The current filesystems involved are roughly the same size as the
>> ones previously in use, but the underlying drives are nearly 2x the
>> size; I decided to leave the extra capacity for later allocation
>> via LVM, if and when I may need it.
> 
> When I was thinking about building md RAID, and then ZFS, I worried
> about having enough capacity for my data.  Now I worry about
> zfs-auto-snapshot(8), daily backups, monthly archives, monthly
> images, etc., clogging my ZFS pools.
> 
> The key concept is "data lifetime". (Or alternatively, "destruction
> policy".)

I can see that for when you have a tiered backup structure, and are
looking at the lifetimes of each backup copy. For my live system, my
intended data lifetime (outside of caches and data kept in /tmp) is
basically "forever".

>> I did my initial data backup to the external drive, from a 
>> still-up-and-running system, via rsnapshot. Attempting to do a
>> second rsnapshot, however, failed at the 'cp -al' stage with "too
>> many hardlinks" errors. It turns out that there is a hard limit of
>> 65000 hardlinks per on-disk file;
> 
> 65,000 hard links seems to be an ext4 limit:
> 
> https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624

That sounds right.

> I believe ZFS can do more hard links. (Much more?  Limited by
> available storage space?)

I'm not sure, but I'll have to look into that, when I get to the point
of trying to set up that tiered backup.

>> I had so many files already hardlinked together on the source
>> filesystem that trying to hardlink each one to just as many new
>> names as there were already hardlinks for that file ran into that
>> limit.
>> 
>> (The default rsnapshot configuration doesn't preserve hardlinks,
>> possibly in order to avoid this exact problem - but that isn't
>> viable for the case I had at hand, because in some cases I *need*
>> to preserve the hardlink status, and because without that
>> deduplication there wouldn't have been enough space on the drive
>> for more than the single copy, in which case there'd be very little
>> point in using rsnapshot rather than just rsync.)
> 
> ZFS provides similarly useful results with built-in compression and
> de-duplication.

I have the impression that there are risk and/or complexity aspects to
it which make it less attractive as a choice, but those features do
sound appealing. I will have to look into it, when I get to that point.

>> In the end, after several flailing-around attempts to minimize or
>> mitigate that problem, I wound up moving the initial external copy
>> of the biggest hardlink-deduplicated tree (which is essentially
>> 100% read-only at this point; it's backup copies of an old system
>> state, preserved since one of those copies has corrupted data and I
>> haven't yet been able to confirm that all of the files in my
>> current copy of that data were taken from the non-corrupt version)
> 
> That sounds like an N-way merge problem -- old file system, multiple
> old backups, and current file system as inputs, all merged into an
> updated current file system as output.  LVM snapshots, jdupes(1), and
> your favorite scripting language come to mind.  Take good notes and
> be prepared to rollback at any step.

It does sound like that, yes. I'm already aware of jdupes, and of a few
other tools (part of the work I already did in getting this far was
rdfind, which is what I used to set up much of the hardlink
deduplication that wound up biting me in the butt), but have not
investigated LVM snapshot - and the idea of trying to script something
like this, without an existing known-safe copy of the data to fall back
on, leaves me *very* nervous.

Figuring out how to be prepared to roll back is the other uncertain and
nervous-making part. In some cases it's straightforward enough, but
doing it at the scale of the size of those copies is at best daunting.

>> out of the way, shutting down all parts of the system that might be
>> writing to the affected filesystems, and manually copying out the
>> final state of the *other* parts of those filesystems via rsync,
>> bypassing rsnapshot. That was on Saturday the 10th.
>> 
>> Then I grabbed copies of various metadata about the filesystems,
>> the LVM, and the mdraid config; modified /etc/fstab to not mount
>> them; deactivated the mdraid, and commented it out of
>> /etc/mdadm/mdadm.conf; updated the initramfs; shut down; pulled all
>> eight Samsung 870 EVO drives; installed six brand-new Intel
>> data-center-class (or so I gather) SSDs;
> 
> Which model?  What size?

lshw says they're INTEL SSDSCK2B03. The packaging says SSDSCK2B038T801.

IIRC, the product listing said they were 3.84 TB (or possibly TiB). lshw
says 'size: 3567GiB (3840GB)'. IIRC, the tools I used to partition them
and build the mdraid and so forth said 3.84 TB/TiB (not sure which), or
3840 GB/GiB (same).

For comparison, the 870 EVO drives - which were supposed to be 2TB
apiece - were reported by some of those same tools as exactly 2000 of
the same unit.

This does mean that I have more total space available in the new array
than in the old one, but I've tried to allocate only as much space as
was in the old array, insofar as I could figure out how to do that in
the limited environment I was working in. (The old array and/or LV setup
had sizes listed along the lines of '<10TiB', but my best attempt at
replicating it gave something which reports sizes along the lines of
'10TiB', so I suspect that my current setup is actually slightly too
large to fit on the old disks.)

>> booted up; partitioned the new drives based on the data I had about
>> what config the Debian installer put in place when creating the 
>> mdraid config on the old ones; created a new mdraid RAID-6 array
>> on them, based on the copied metadata; created a new LVM stack on
>> top of that, based on *that* copied metadata; created new
>> filesystems on top of that, based on *that* copied metadata;
>> rsync'ed the data in from the manually-created external backup;
>> adjusted /etc/fstab and /etc/mdadm/mdadm.conf to reflect the new
>> UUID and names of the new storage configuration; updated the
>> initramfs; and rebooted. Given delay times for the drives to arrive
>> and for various data-validation and plan-double-checking steps to
>> complete, the end of that process happened this afternoon.
>> 
>> And it appears to Just Work. I haven't examined all the data to
>> validate that it's in good condition, obviously (since there's
>> nearly 3TB of it), but the parts I use on a day-to-day basis are
>> all looking exactly the way they should be. It appears that the
>> cross-drive redundancy of the RAID-6 array was enough to have
>> avoided avoid data loss from the scattered read failures of the
>> underlying drives before I could get the data out.
> 
> Data integrity validation is tough without a mechanism.  Adding an
> rsnapshot(1) postexec MD5SUMS, etc., file into the root of each
> backup tree could solve this need, but could waste a lot of time and
> energy checksumming files that have not changed.

AFAIK, all such things require you to be starting from a point with a
known-good copy of the data, which is a luxury I don't currently have
(as far as validating my current data goes). It's something to keep in
mind when planning a more proper backup system, however.

> One of the reasons I switched to ZFS was because ZFS has built-in
> data and metadata integrity checking (and repair; depending upon
> redundancy).

I'm not sure I understand how this would be useful in the case I have at
hand; that probably means that I'm not understanding the picture properly.

>> (This does leave me without having restored the read-only backup
>> data from the old system state. I care less about that; I'll want
>> it eventually, but it isn't important enough to warrant postponing
>> getting the system back in working order.)
>> 
>> 
>> I do still want/need to figure out what to do about an *actual*
>> backup system, to external storage, since the rsnapshot thing
>> apparently isn't going to be viable for my circumstance and use
>> case. There is, however, now *time* to work on doing that, without
>> living under the shadow of a known immediate/imminent data-loss
>> hardware failure.
> 
> rsync(1) should be able to copy backups onto an external HDD.

Yeah, but that only provides one tier of backup; the advantage of
rsnapshot (or similar) is the multiple deduplicated tiers, which gives
you options if it turns out the latest backup already included the
damage you're trying to recover from.

> If your chassis has an available 5.25" half-height external drive bay
> and you have an available SATA 6 Gbps port, mobile racks are a more
> reliable connection than USB for 3.5" HDD's because there are no
> cables to bump or power adapters to fail or unplug:
> 
> https://www.startech.com/en-us/hdd/drw150satbk

I don't think it does have one; at least from the outside, I don't see
any 5.25" bays on the case at all. I know I didn't include an internal
optical drive when building this system, and while part of that was lack
of free SATA ports, the lack of such an exposed bay would also have been
a contributing factor.

(USB-3 will almost certainly not be a viable option for an automatic
scheduled backup of the sort rsnapshot's documentation suggests, because
the *fastest* backup cycle I saw from my working with the data I had was
over three hours, and the initial pass to copy the data out to the drive
in the first place took nearly *20* hours. A cron job to run even an
incremental backup even once a day, much less the several times a day
suggested for the deeper rsnapshot tiers, would not be *remotely*
workable in that sort of environment. Though on the flip side, that's
not just a USB-3 bottleneck, but also the bottleneck of the spinning
mechanical hard drive inside the external case...)

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man.         -- George Bernard Shaw

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: