Bug#625922: SATA devices get reset without real hardware failure
This bug (in general, not just this on this web) have been in GNU/Linux since
a long time with various disks, mainboards, SATA controllers, distros and
kernels (maybe since changes after 2.6.24).
In https://bugzilla.redhat.com/show_bug.cgi?id=684599 David Zeuthen says
"it's most probably caused by this commit
http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f7073abd483c5f08e
"
Possible workarounds readed to this bug:
-1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some
devices may not support 16-byte ATA commands) (
https://bugzilla.redhat.com/show_bug.cgi?id=684599 )
-2: (Same as 1) Add options libata atapi_passthru16=0 to
/etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" to
/etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 )
-3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the
kernel in /etc.grub.conf seems to have solved the problem for me" (
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ). Raman
Gupta and Andreas M. Kirchwitz say in other forums that adding 'acpi=off'
doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-4: (Similar to 3) Completely disable ACPI in mainboard BIOS. (
http://lists.debian.org/debian-user/2010/01/msg00023.html )
-5: Gaetan Cambier says "add the option line to grub to disable ncq :
'libata.force=noncq' for me, with this, i have no froze". (
https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it
doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies that
not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ).
-6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off ?
For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did
not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-7: A. Mani says "For the SB600 controller, the right thing to do is to
restrict all drives to 1.5Gbps by jumpers or with a boot option." Raman Gupta
replies "I also tried this -- but with this setting all drives attached to my
Marvell controller could not even be started by the kernel -- permanent
"failed to IDENTIFY" errors." (
https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-8: DjznBR (djzn-br) says he have trying some things WITHOUT success it and
finally one that works. Doesn't work: TURNED HDPARM OFF, CHANGED CABLE,
EXPERIMENTED AHCI & RAID MODES, DISABLED NCQ, COMPILED KERNEL WITH
CONFIG_SATA_PMP DISABLED, TRYING NOW LIBATA.FORCE=1.5GBPS, changed the cables
to different routes... SATA1 -> SATA2 SATA2 -> SATA3 ---- Works (but still
gives "softreset failed (device not ready)" messages in dmesg and afterwards
recover without data loss) : Added option for kernel in grub configuration
"libata.noacpi=1". Also says "libata.force=norst ... prevents soft and hard
link resettings. If you have that switch on, when this bug comes up, there is
a system lock down (because obviously the kernel prevented the soft & hard
resetting." ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 )
Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable)
with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb
IDE disk (configured by debian as sda) in an old mainboard . No RAID used, but
only soft reset, and no hard reset, so I don't lose data. Could send logs, but
I think they wouldn't give any more info.
Same problem in my desktop PC every 2 or 3 months in Debian testing with
kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64,
2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two
Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a
RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb
SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct
raid. Could send logs, but I think they wouldn't give any more info.
Possibly these are the same bug: #539059, #603061, #524876
Same bug in other distros and kernels:
-Archlinux with udev-165 and udev-166:
https://bbs.archlinux.org/viewtopic.php?pid=895404
-Fedora with kernel 2.6.38-0.rc8.git0.1.fc15.x86_64 and udev-166 in a DVD
reader: https://bugzilla.redhat.com/show_bug.cgi?id=684599
-Fedora 13 with kernel 2.6.33.8-149.fc13.i686.PAE or Fedora 13 64bit on a Mac
Mini
-Fedora 14 with kernels 2.6.31.6-166.fc12@x86_64, 2.6.32.11-99.fc12.x86_64,
2.6.35.9-64.fc14.x86_64, 2.6.35.10-72.fc14.i686 and 2.6.35.10-74.fc14.x86_64
and 2.6.35.11-83.fc14.x86_64 and 2.6.35.14-95.fc14.x86_64:
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Fedora 15 (updated from Fedora 14):
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Centos5.5-x64 with kernel 2.6.18-194-x64:
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-RHEL5 with vanilla kernel 2.6.37.3:
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Ubuntu since 8.10 64bit with kernels 2.6.27-7, 2.6.28-15-generic, 2.6.31-14-
generic, 2.6.31-15-generic (on a Macbook2), 2.6.38-7-generic (kernel-ppa):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892
-Ubuntu 10.04: https://bugzilla.redhat.com/show_bug.cgi?id=549981
--
Bye: Javier Ortega Conde (Malkavian)
________________________________________________________________________
The Malkavian's webpage: Many things http://malkavian.dyndns.org
Member of LinUxers Group from Bizkaia (GLUB) http://glub.biz
Member of GoBi Go Club, Eghost, Itsas, Aske, Guardianes del Túmulo...
________________________________________________________________________
Microsoft is to operating systems and security what McDonald's to gourmet food
and healthy nutrition. (Javier Ortega Conde (Malkavian))
Reply to: