[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug with soft raid?



On 2/12/19 11:37 AM, Tom Bachreier wrote:


Feb 12, 2019, 12:08 PM by dlist@bluewin.ch:
The system blocks for about 3 minutes and then I get back a hand on it.

I have a similar - maybe the same - problem in buster - see the thread
"Software RAID blocks" on this list about a month ago. Unfortunately
still no solution. :-(

I have the advantage that my system harddisk is outside the RAID on a
separate disk. Therefore I'm still able to send "low level" commands
like smartctl or fdisk to the disks in the array during the block. If
I trigger the right disk the block aborts immediately.

In each of my machines, I use a single 16 GB USB 3.0 flash drive, or a small SDD, for the system drive. I then use btrfs for all file systems. It is my expectation that if a disk goes bad, the machine will log an error and/or halt.


Maybe this works for you, too?
You can try:

for i in /dev/sd{b..f}; do echo "DISK: ${i}"; smartctl -l scterc "${i}"; sleep 3; done

Some drives allow you to adjust the Error Recovery Control timeout in their firmware. You can use this to force the drive to return an error promptly, rather than spending minutes trying to recover (e.g. block for 3 minutes):

https://en.wikipedia.org/wiki/Error_recovery_control


I had a Linux md RAID0 (mirror) built from two older desktop/ SOHO server drives that supported scterc. So, I put commands like the following, one per drive, into a script that was run at system startup:

    # /usr/sbin/smartctl -l scterc,70,70 /dev/disk/by-id/ata-XXX_YYY
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

    SCT Error Recovery Control set to:
               Read:     70 (7.0 seconds)
              Write:     70 (7.0 seconds)


David


Reply to: