[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug with soft raid?



Hi Steve,

On Fri, Feb 15, 2019 at 09:35:27AM +0100, steve wrote:
> >for i in /dev/sd{b..f}; do echo "DISK: ${i}"; smartctl -l scterc "${i}"; sleep 3; done
> 
> I get this for sdb and sdc
> 
> SCT Error Recovery Control:
>           Read: Disabled
>          Write: Disabled
> 
> and this for sdf
> 
> SCT Error Recovery Control:
>           Read:     70 (7.0 seconds)
>          Write:     70 (7.0 seconds)
> 
> What does it tell me ?

It means that sd[bc] may support SCTERC but it's disabled (promising),
and sdf does support it and it's set to 7 seconds (good).

For disks in Linux software RAID, SCTERC with a low timeout is
essential. If it's not possible then the block layer timeout for the
device should be increased.

You should try to set SCTERC for sd[bc] like so:

# for dev in /dev/sd[cd]; do smartctl -l scterc,70,70 "$dev"; done

If that works then great - all your drives support SCTERC and have low
timeouts.

If setting it to 70 (centiseconds, so 7 seconds) doesn't work then you
will need to increase the block layer timeout like this:

# for dev in sd[cd]; do echo 180 > /sys/block/sda/device/timeout; done

The reason to do this is that should any of your drives encounter a
problem reading or writing, without SCTERC set the drive will try very
hard to do whatever it was meant to be doing for a very long period of
time, and while it's doing that it will be unresponsive to anything
else.

The default block layer timeout is 30 seconds and a drive having
problems reading or writing just 1 sector can easily spend longer than
this trying to do so. Linux then drops the entire device from the array.
If you're lucky this only happens on the one device and you're able to
add it back in again, but a very common cause of arrays not assembling
or always having a device kicked out is these sorts of timeouts.

When using RAID it is much better to have the drive give up sooner as
the RAID should take care of what was unable to be read or written. So,
being able to set SCTERC is best, but failing that you really must set
the block layer timeout high enough.

The smartctl and /sys/block settings above don't survive a power cycle
so would need to be set at every boot.

Cheers,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting


Reply to: