Fwd: raid1 issue, somewhat related to recent "debian on big machines"
How to run selftest with raid1?
smartctl -t /dev/sda (or sda1, sda2, ..)
is incorrect. I don't remember what should first be done with mdadm.
I would like to run the test because
smartctl --all /dev/sd#
report errors for only sda, as follows:
SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 12 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 4a 9d 9b ec Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 47 9d 9b 4c 00 00:52:49.600 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:49.600 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:49.600 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:49.600 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:49.600 READ DMA
Error 11 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 4a 9d 9b ec Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 47 9d 9b 4c 00 00:52:46.400 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:46.400 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:46.400 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:46.400 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:46.400 READ DMA
Error 10 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 4a 9d 9b ec Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 47 9d 9b 4c 00 00:52:42.950 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:42.950 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:42.950 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:42.950 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:42.950 READ DMA
Error 9 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 4a 9d 9b ec Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 47 9d 9b 4c 00 00:52:39.800 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:39.800 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:39.800 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:39.800 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:39.800 READ DMA
Error 8 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 4a 9d 9b ec Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 47 9d 9b 4c 00 00:52:36.600 READ DMA
ec 00 08 4a 9d 9b 00 00 00:52:36.600 IDENTIFY DEVICE
c8 00 08 47 9d 9b 4c 00 00:52:36.600 READ DMA
c8 00 08 3f 9d 9b 4c 00 00:52:36.600 READ DMA
c8 00 08 37 9d 9b 4c 00 00:52:36.600 READ DMA
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
=======
I.e., I am still uncertain if sda has to be replaced with a new disk,
or the errors reported were temporary and have been removed by raid1.
thanks
francesco
---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Tue, Mar 3, 2009 at 11:21 AM
Subject: Re: raid1 issue, somewhat related to recent "debian on big machines"
To: Ron Johnson <ron.l.johnson@cox.net>
On Tue, Mar 3, 2009 at 10:08 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:
> On 03/03/2009 02:53 AM, Francesco Pietra wrote:
>>
>> lupus in fabula as a follow up of my short intervention on raid1 with
>> my machine to the thread "Debian on big systems".
>>
>> System: supermicro H8QC8 m.board, two WD Raptor SATA 150GB, Debian
>> amd64 lenny, raid1
>>
>> While running an electronic molecular calculation - estimated to four
>> days time - I noticed by chance on the screen (what is not in the out
>> file of the calculation) that there was a disk problem. I took some
>> scattered notes from the scree:
>>
>> RAID1 conf printout
>>
>> wd: 1 rd:2
>
> [snip]
>
> What you are looking for should be in syslog, not your application's log.
OK, but /var/log/syslog
tels nothing more that I took notice about from the screen: sda sector
0 problematic, disk failure, continuing on one disk. My question is,
what does the lshw -disk output mean (SCSI vs SATA, as I have shown),
and if one disk has to be replaced with a new one. If so, to identify
which is which can I detach the SATA connection to the disks and see
which one works?
thanks
francesco
>
> --
> Ron Johnson, Jr.
> Jefferson LA USA
>
> The feeling of disgust at seeing a human female in a Relationship
> with a chimp male is Homininphobia, and you should be ashamed of
> yourself.
>
>
> --
> To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
> listmaster@lists.debian.org
>
>
Reply to: