[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: problem with SATA disk, difference between standard kernel and Debian kernel



An update on this (see long quote below) --- maybe it helps someone
with a similar problem:

After a total disk failure seemed imminent --- the disconnected disk
didn't come back after turning the computer off and back on --- I got
two new disks to replace the Maxtor 7V300F0 disks. I made a software
RAID-1 from the new disks and managed to copy all my data to the new
disks.

That allowed me to play around with the Maxtor disks. I found out that
there is a firmware update available for them[1] which is supposed to
solve the problem with the disks disconnecting. I updated the firmware
today.

The disks seem to be working so far; I'm using them for making
backups, and a backup on one of the disks made before the firmware
update is still readable after updating the firmware. I didn't check
the other disk, but mdadm still recognized the other disk as being
part of a RAID-1 and started an md device for it, which would indicate
that everything on that disk was also still readable. --- Time will
tell if the problem with disconnecting is finally solved.

For reference on the firmware, see [2].


If you need to boot some DOS version from a USB stick or an USB disk:
Install the unetbootin package. Download a FreeDos 2.88MB floppy
image[3] (1.44 is too small) and the firmware archive. Unzip the
firmware archive. Mount the floppy image file as loop device (mount
here imagefile -o loop) and copy the files from the firmware archive
into the image file. Unmount the image file. Use unetbootin to to make
a bootable USB stick from the floppy image (Select "Floppy Image"
instead of "ISO" in unetbootin when selecting the file to write onto
the stick.)  Disconnect all hard disks and DVD/CD drives except for
the Maxtor disk the firmware of which you're going to update. Turn off
AHCI mode in the BIOS. Boot from the USB stick, but press F8 while
booting (after the boot manager) and do NOT load highmem, emm386 and
especially not some pciusb.sys (or how it was called). Run dload.exe,
choose "no power control", "first disk found" and an option called
something like "transfer in one part"; then select the firmware
file. It takes a few seconds to update the firmware; the update
program will tell you when it was successful. When it was successful,
exit the update program and start it again to verify the firmware
version. It should show firmware version VA111680. --- It worked for
me, but ymmv, so take all precautions, like making backups before you
start ...

On a side note, it took me awfully long to figure out how to make a
"DOS bootable USB stick under Linux". Try to google for that, you just
don't find it ... Your BIOS must be able to boot from such devices,
but if it does, it seems you could even use a card reader (with a card
inserted, of course) instead of a stick, and it doesn't matter if the
stick says that is supports booting or not. Unetbootin is awesome
... This one might also be interesting: http://www.ultimatebootcd.com/



Using a Debian kernel (2.6.24) --- one of the things I tried --- did
not solve the problem with disconnecting.


[1]: http://www.eserviceinfo.com/downloadsm/24514/_.html
[2]: http://forums.storagereview.net/index.php?showtopic=22435&st=0
[3]: http://www.fdos.org/bootdisks/ --- I think I used another one,
     but I don't remember where I got it. If you need the image file I
     used, let me know and I can mail it to you.


On Wed, Dec 10, 2008 at 03:15:56PM -0600, lee wrote:
> Hi,
> 
> what's the difference between a standard kernel and a kernel that
> comes as a Debian package?
> 
> I'm using a standard kernel, but I'm having problems with one of my
> disks (see below). The disk "gets lost" every now and then, i. e. it
> seems to take a couple days or weeks now (I've seen it taking as long
> as about two months with the old board) before it happens. The disk
> remains unavailable until I turn the power off and back on. Once the
> disk is back, I can re-add the partitions on the failed disk to the md
> devices, and they are being rebuilt just fine, and it works for some
> time until the disk "gets lost" again.
> 
> This problem isn't new; it has been there with another board/CPU/RAM,
> cables and power supply ever since I got the two SATA disks new. It's
> been there with every standard kernel I tried over the years, with
> i368, and now it's the same with amd64. I've been thinking it was a
> problem of the board I had, but as it's there with another board etc.,
> it must be either the disk itself or the SATA driver.
> 
> Googling revealed that this isn't a rare problem. There are people
> reporting it with all kinds of different disks and boards and
> different distributions. Some suggest that it's a problem with the PSU
> or the SATA cables, but imho that's unlikely. Interestingly, it seems
> to be more common for this problem to show up in RAID setups.
> 
> Also interestingly, mdadm did *not* detect the disk failure for
> /dev/md2 which is mounted read only.
> 
> And even more interestingly, the problem is and has always been with
> /dev/sdb, never with /dev/sda. I can't tell if the disks have been
> swapped when I connected them to the new board, though. But I'd rule
> out a problem with the firmware of the disk as well since both disks
> use the same firmware version.
> 
> So is there a difference between Debian and standard kernels so that I
> might not have this problem if I'd use a Debian kernel? Has this
> problem been solved in some way yet?
> 
> I might get another two disks, but I'm afraid that the same problem
> would come up with other disks as well ...
> 
> 
> Info:
> 
> cat:/home/lee# uname -a
> Linux cat 2.6.27.7-cat-smp #4 SMP Thu Dec 4 16:03:29 CST 2008 x86_64 GNU/Linux
> cat:/home/lee# smartctl -i /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Maxtor MaXLine III family (SATA/300)
> Device Model:     Maxtor 7V300F0
> Serial Number:    V604E3FG
> Firmware Version: VA111630
> User Capacity:    300,090,728,448 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
> Local Time is:    Wed Dec 10 15:00:04 2008 CST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> cat:/home/lee# smartctl -i /dev/sdb
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Maxtor MaXLine III family (SATA/300)
> Device Model:     Maxtor 7V300F0
> Serial Number:    V601T7VG
> Firmware Version: VA111630
> User Capacity:    300,090,728,448 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
> Local Time is:    Wed Dec 10 15:00:42 2008 CST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> cat:/home/lee# lspci
> [...]
> 00:1f.2 SATA controller: Intel Corporation 82801IB (ICH9) 4 port SATA AHCI Controller (rev 02)
> 
> 
> syslog:
> 
> 
> Dec 10 00:09:10 cat kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> Dec 10 00:09:10 cat kernel: ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> Dec 10 00:09:10 cat kernel:          res 40/00:00:00:4f:c2/00:00:00:c2:00/00 Emask 0x4 (timeout)
> Dec 10 00:09:10 cat kernel: ata5.00: status: { DRDY }
> Dec 10 00:09:10 cat kernel: ata5: hard resetting link
> Dec 10 00:09:10 cat kernel: ata5: SATA link down (SStatus 0 SControl 300)
> Dec 10 00:09:15 cat kernel: ata5: hard resetting link
> Dec 10 00:09:16 cat kernel: ata5: SATA link down (SStatus 0 SControl 300)
> Dec 10 00:09:21 cat kernel: ata5: hard resetting link
> Dec 10 00:09:21 cat kernel: ata5: SATA link down (SStatus 0 SControl 300)
> Dec 10 00:09:21 cat kernel: ata5.00: disabled
> Dec 10 00:09:21 cat kernel: end_request: I/O error, dev sdb, sector 478543967
> Dec 10 00:09:21 cat kernel: md: super_written gets error=-5, uptodate=0
> Dec 10 00:09:21 cat kernel: raid1: Disk failure on sdb2, disabling device.
> Dec 10 00:09:21 cat kernel: raid1: Operation continuing on 1 devices.
> Dec 10 00:09:21 cat kernel: ata5: EH complete
> Dec 10 00:09:21 cat kernel: ata5.00: detaching (SCSI 4:0:0:0)
> Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache
> Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Stopping disk
> Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] START_STOP FAILED
> Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
> Dec 10 00:09:21 cat kernel: RAID1 conf printout:
> Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
> Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda2
> Dec 10 00:09:21 cat kernel:  disk 1, wo:1, o:0, dev:sdb2
> Dec 10 00:09:21 cat kernel: RAID1 conf printout:
> Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
> Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda2
> Dec 10 00:09:21 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device
> Dec 10 00:09:21 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device
> Dec 10 00:09:21 cat kernel: end_request: I/O error, dev sdb, sector 146496512
> Dec 10 00:09:21 cat kernel: md: super_written gets error=-5, uptodate=0
> Dec 10 00:09:21 cat kernel: raid1: Disk failure on sdb1, disabling device.
> Dec 10 00:09:21 cat kernel: raid1: Operation continuing on 1 devices.
> Dec 10 00:09:21 cat mdadm[1995]: Fail event detected on md device /dev/md1, component device /dev/sdb2
> Dec 10 00:09:21 cat kernel: RAID1 conf printout:
> Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
> Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda1
> Dec 10 00:09:21 cat kernel:  disk 1, wo:1, o:0, dev:sdb1
> Dec 10 00:09:21 cat kernel: RAID1 conf printout:
> Dec 10 00:09:21 cat kernel:  --- wd:1 rd:2
> Dec 10 00:09:21 cat kernel:  disk 0, wo:0, o:1, dev:sda1
> Dec 10 00:10:21 cat mdadm[1995]: Fail event detected on md device /dev/md0
> 
> 


Reply to: