[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

problem with SATA disk, difference between standard kernel and Debian kernel



Hi,

what's the difference between a standard kernel and a kernel that
comes as a Debian package?

I'm using a standard kernel, but I'm having problems with one of my
disks (see below). The disk "gets lost" every now and then, i. e. it
seems to take a couple days or weeks now (I've seen it taking as long
as about two months with the old board) before it happens. The disk
remains unavailable until I turn the power off and back on. Once the
disk is back, I can re-add the partitions on the failed disk to the md
devices, and they are being rebuilt just fine, and it works for some
time until the disk "gets lost" again.

This problem isn't new; it has been there with another board/CPU/RAM,
cables and power supply ever since I got the two SATA disks new. It's
been there with every standard kernel I tried over the years, with
i368, and now it's the same with amd64. I've been thinking it was a
problem of the board I had, but as it's there with another board etc.,
it must be either the disk itself or the SATA driver.

Googling revealed that this isn't a rare problem. There are people
reporting it with all kinds of different disks and boards and
different distributions. Some suggest that it's a problem with the PSU
or the SATA cables, but imho that's unlikely. Interestingly, it seems
to be more common for this problem to show up in RAID setups.

Also interestingly, mdadm did *not* detect the disk failure for
/dev/md2 which is mounted read only.

And even more interestingly, the problem is and has always been with
/dev/sdb, never with /dev/sda. I can't tell if the disks have been
swapped when I connected them to the new board, though. But I'd rule
out a problem with the firmware of the disk as well since both disks
use the same firmware version.

So is there a difference between Debian and standard kernels so that I
might not have this problem if I'd use a Debian kernel? Has this
problem been solved in some way yet?

I might get another two disks, but I'm afraid that the same problem
would come up with other disks as well ...


Info:

cat:/home/lee# uname -a
Linux cat 2.6.27.7-cat-smp #4 SMP Thu Dec 4 16:03:29 CST 2008 x86_64 GNU/Linux
cat:/home/lee# smartctl -i /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Maxtor MaXLine III family (SATA/300)
Device Model:     Maxtor 7V300F0
Serial Number:    V604E3FG
Firmware Version: VA111630
User Capacity:    300,090,728,448 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Wed Dec 10 15:00:04 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

cat:/home/lee# smartctl -i /dev/sdb
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Maxtor MaXLine III family (SATA/300)
Device Model:     Maxtor 7V300F0
Serial Number:    V601T7VG
Firmware Version: VA111630
User Capacity:    300,090,728,448 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Wed Dec 10 15:00:42 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

cat:/home/lee# lspci
[...]
00:1f.2 SATA controller: Intel Corporation 82801IB (ICH9) 4 port SATA AHCI Controller (rev 02)


syslog:


Dec 10 00:09:10 cat kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 10 00:09:10 cat kernel: ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Dec 10 00:09:10 cat kernel:          res 40/00:00:00:4f:c2/00:00:00:c2:00/00 Emask 0x4 (timeout)
Dec 10 00:09:10 cat kernel: ata5.00: status: { DRDY }
Dec 10 00:09:10 cat kernel: ata5: hard resetting link
Dec 10 00:09:10 cat kernel: ata5: SATA link down (SStatus 0 SControl 300)
Dec 10 00:09:15 cat kernel: ata5: hard resetting link
Dec 10 00:09:16 cat kernel: ata5: SATA link down (SStatus 0 SControl 300)
Dec 10 00:09:21 cat kernel: ata5: hard resetting link
Dec 10 00:09:21 cat kernel: ata5: SATA link down (SStatus 0 SControl 300)
Dec 10 00:09:21 cat kernel: ata5.00: disabled
Dec 10 00:09:21 cat kernel: end_request: I/O error, dev sdb, sector 478543967
Dec 10 00:09:21 cat kernel: md: super_written gets error=-5, uptodate=0
Dec 10 00:09:21 cat kernel: raid1: Disk failure on sdb2, disabling device.
Dec 10 00:09:21 cat kernel: raid1: Operation continuing on 1 devices.
Dec 10 00:09:21 cat kernel: ata5: EH complete
Dec 10 00:09:21 cat kernel: ata5.00: detaching (SCSI 4:0:0:0)
Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache
Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Stopping disk
Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] START_STOP FAILED
Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
Dec 10 00:09:21 cat kernel: RAID1 conf printout:
Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda2
Dec 10 00:09:21 cat kernel:  disk 1, wo:1, o:0, dev:sdb2
Dec 10 00:09:21 cat kernel: RAID1 conf printout:
Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda2
Dec 10 00:09:21 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device
Dec 10 00:09:21 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device
Dec 10 00:09:21 cat kernel: end_request: I/O error, dev sdb, sector 146496512
Dec 10 00:09:21 cat kernel: md: super_written gets error=-5, uptodate=0
Dec 10 00:09:21 cat kernel: raid1: Disk failure on sdb1, disabling device.
Dec 10 00:09:21 cat kernel: raid1: Operation continuing on 1 devices.
Dec 10 00:09:21 cat mdadm[1995]: Fail event detected on md device /dev/md1, component device /dev/sdb2
Dec 10 00:09:21 cat kernel: RAID1 conf printout:
Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda1
Dec 10 00:09:21 cat kernel:  disk 1, wo:1, o:0, dev:sdb1
Dec 10 00:09:21 cat kernel: RAID1 conf printout:
Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda1
Dec 10 00:10:21 cat mdadm[1995]: Fail event detected on md device /dev/md0


-- 
"Don't let them, daddy. Don't let the stars run down."
http://adin.dyndns.org/adin/TheLastQ.htm


Reply to: