Fwd: raid1 issue, somewhat related to recent "debian on big machines"

To: amd64 Debian <debian-amd64@lists.debian.org>
Subject: Fwd: raid1 issue, somewhat related to recent "debian on big machines"
From: Francesco Pietra <chiendarret@gmail.com>
Date: Tue, 3 Mar 2009 17:02:21 +0100
Message-id: <b87c422a0903030802u6c63c0eev2ae25ee8c27208a9@mail.gmail.com>
In-reply-to: <b87c422a0903030221s7fbd3fcdq51609c77ec244e85@mail.gmail.com>
References: <b87c422a0903030053j5f468626ge3ee70818d277c77@mail.gmail.com> <49ACF375.6050808@cox.net> <b87c422a0903030221s7fbd3fcdq51609c77ec244e85@mail.gmail.com>

How to run selftest with raid1?

smartctl -t /dev/sda (or sda1, sda2, ..)

is incorrect. I don't remember what should first be done with mdadm.

I would like to run the test because

smartctl --all /dev/sd#

report errors for only sda, as follows:

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 4a 9d 9b ec  Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 9d 9b 4c 00      00:52:49.600  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:49.600  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:49.600  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:49.600  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:49.600  READ DMA

Error 11 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 4a 9d 9b ec  Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 9d 9b 4c 00      00:52:46.400  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:46.400  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:46.400  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:46.400  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:46.400  READ DMA

Error 10 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 4a 9d 9b ec  Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 9d 9b 4c 00      00:52:42.950  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:42.950  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:42.950  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:42.950  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:42.950  READ DMA

Error 9 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 4a 9d 9b ec  Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 9d 9b 4c 00      00:52:39.800  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:39.800  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:39.800  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:39.800  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:39.800  READ DMA

Error 8 occurred at disk power-on lifetime: 1940 hours (80 days + 20 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 4a 9d 9b ec  Error: UNC 8 sectors at LBA = 0x0c9b9d4a = 211524938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 9d 9b 4c 00      00:52:36.600  READ DMA
  ec 00 08 4a 9d 9b 00 00      00:52:36.600  IDENTIFY DEVICE
  c8 00 08 47 9d 9b 4c 00      00:52:36.600  READ DMA
  c8 00 08 3f 9d 9b 4c 00      00:52:36.600  READ DMA
  c8 00 08 37 9d 9b 4c 00      00:52:36.600  READ DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
=======

I.e., I am still uncertain if sda has to be replaced with a new disk,
or the errors reported were temporary and have been removed by raid1.

thanks

francesco


---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Tue, Mar 3, 2009 at 11:21 AM
Subject: Re: raid1 issue, somewhat related to recent "debian on big machines"
To: Ron Johnson <ron.l.johnson@cox.net>


On Tue, Mar 3, 2009 at 10:08 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:
> On 03/03/2009 02:53 AM, Francesco Pietra wrote:
>>
>> lupus in fabula as a follow up of my short intervention on raid1 with
>> my machine to the thread "Debian on big systems".
>>
>> System: supermicro H8QC8 m.board, two WD Raptor SATA 150GB, Debian
>> amd64 lenny, raid1
>>
>> While running an electronic molecular calculation - estimated to four
>> days time - I noticed by chance on the screen (what is not in the out
>> file of the calculation) that there was a disk problem. I took some
>> scattered notes from the scree:
>>
>> RAID1 conf printout
>>
>> wd: 1 rd:2
>
> [snip]
>
> What you are looking for should be in syslog, not your application's log.


OK, but /var/log/syslog

tels nothing more that I took notice about from the screen: sda sector
0 problematic, disk failure, continuing on one disk. My question is,
what does the lshw -disk output mean (SCSI vs SATA, as I have shown),
and if one disk has to be replaced with a new one. If so, to identify
which is which can I detach the SATA connection to the disks and see
which one works?

thanks
francesco


>
> --
> Ron Johnson, Jr.
> Jefferson LA  USA
>
> The feeling of disgust at seeing a human female in a Relationship
> with a chimp male is Homininphobia, and you should be ashamed of
> yourself.
>
>
> --
> To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
> listmaster@lists.debian.org
>
>

Reply to:

Follow-Ups:
- Re: raid1 issue, somewhat related to recent "debian on big machines"
  - From: Francesco Pietra <chiendarret@gmail.com>

References:
- raid1 issue, somewhat related to recent "debian on big machines"
  - From: Francesco Pietra <chiendarret@gmail.com>
- Re: raid1 issue, somewhat related to recent "debian on big machines"
  - From: Ron Johnson <ron.l.johnson@cox.net>

Prev by Date: Re: raid1 issue, somewhat related to recent "debian on big machines"
Next by Date: Re: raid1 issue, somewhat related to recent "debian on big machines"
Previous by thread: Re: raid1 issue, somewhat related to recent "debian on big machines"
Next by thread: Re: raid1 issue, somewhat related to recent "debian on big machines"
Index(es):
- Date
- Thread