Re: RAID-1 and disk I/O

To: debian-user@lists.debian.org
Subject: Re: RAID-1 and disk I/O
From: Bob Weber <bobrweber@gmail.com>
Date: Sat, 17 Jul 2021 10:41:27 -0400
Message-id: <[🔎] 2188c168-c507-9a12-c0dd-fe70e5607611@gmail.com>
Reply-to: bobrweber@gmail.com
In-reply-to: <[🔎] ygfim19t8ye.fsf@tehran.isnogud.escape.de>
References: <[🔎] ygfim19t8ye.fsf@tehran.isnogud.escape.de>

On 7/17/21 08:34, Urs Thuermann wrote:

Here, the noticable lines are IMHO

    Raw_Read_Error_Rate     (208245592 vs. 117642848)
    Command_Timeout         (8 14 17 vs. 0 0 0)
    UDMA_CRC_Error_Count    (11058 vs. 29)

Do these numbers indicate a serious problem with my /dev/sda drive?
And is it a disk problem or a transmission problem?
UDMA_CRC_Error_Count sounds like a cable problem for me, right?

BTW, for a year so I had problems with /dev/sda every couple of month,
where the kernel set the drive status in the RAID array to failed.  I
could always fix the problem by hot-plugging out the drive, wiggling
the SATA cable, re-inserting and re-adding the drive (without any
impact on the running server).  Now, I haven't seen the problem for
quite a while.  My suspect is that the cable is still not working very
good, but failures are not often enough to set the drive to "failed"
status.

urs

I switched from Seagate to WD Red years ago since I couldn't get them to last more than a year or so. I have one WD that is 6.87 years old with no errors. Well past the 5 year life expectancy. In recent years WD has pulled a marketing controversy on their Red drives. See:

https://arstechnica.com/gadgets/2020/06/western-digital-adds-red-plus-branding-for-non-smr-hard-drives/

So be careful to get the Pro version if you decide to try WD. I use the WD4003FFBX (4T) drives (Raid 1) and have them at 2.8 years running 24/7 with no problems.

If you value your data get another drive NOW .. they are already 5 and 5.8 years old! Add it to the array and let it settle in (sync) and see what happens. I hope your existing array can hold together long enough to add a 3rd drive. I would have replaced those drives long ago from all the errors reported. You might want to get new cables also since you have had problems in the past.

I also run self tests weekly to make sure the drives are ok. I run smartctl -a daily also. I also run backuppc on a separate server to get backups of important data.

There are some programs in /usr/share/mdadm that can check an array but I would wait until you have a new drive added to the array before testing the array. Here is the warning that comes with another script I found:

----------------------------------------

DATA LOSS MAY HAVE OCCURRED.

This condition may have been caused by one of more of the following events:

. A LEGITIMATE write to a memory mapped file or swap partition backed by a
RAID1 (and only a RAID1) device - see the md(4) man page for details.

. A power failure when the array was being written-to.
Data corruption by a hard disk drive, drive controller, cable etc.

. A kernel bug in the md or storage subsystems etc.

. An array being forcibly created in an inconsistent state using --assume-clean

This count is updated when the md subsystem carries out a 'check' or
'repair' action. In the case of 'repair' it reflects the number of
mismatched blocks prior to carrying out the repair.

Once you have fixed the error, carry out a 'check' action to reset the count
to zero.

See the md (section 4) manual page, and the following URL for details:

https://raid.wiki.kernel.org/index.php/Linux_Raid#Frequently_Asked_Questions_-_FAQ

--------------------------

The problem is that if a miss count occurs then which drive (Raid 1) is correct! I also run programs like debsums to check programs after an update so I know there is no bit rot in important programs as explained above.

Hope this helps.

...Bob

Reply to:

References:
- RAID-1 and disk I/O
  - From: Urs Thuermann <urs@isnogud.escape.de>

Prev by Date: Re: RAID-1 and disk I/O
Next by Date: Re: RAID-1 and disk I/O
Previous by thread: Re: RAID-1 and disk I/O
Next by thread: Re: RAID-1 and disk I/O
Index(es):
- Date
- Thread