[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#625922: SATA devices get reset without real hardware failure



This bug (in general, not just this on this web) have been in GNU/Linux since 
a long time with various disks, mainboards, SATA controllers, distros and 
kernels (maybe since changes after 2.6.24).

In https://bugzilla.redhat.com/show_bug.cgi?id=684599  David Zeuthen says 
"it's most probably caused by this commit 
http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f7073abd483c5f08e 
"

Possible workarounds readed to this bug: 
-1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some 
devices may not support 16-byte ATA commands) ( 
https://bugzilla.redhat.com/show_bug.cgi?id=684599 )
-2: (Same as 1) Add options libata atapi_passthru16=0 to 
/etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" to 
/etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 )
-3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the 
kernel in /etc.grub.conf seems to have solved the problem for me"  ( 
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ).  Raman 
Gupta  and Andreas M. Kirchwitz say in other forums that adding 'acpi=off' 
doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-4: (Similar to 3) Completely disable ACPI in mainboard BIOS. ( 
http://lists.debian.org/debian-user/2010/01/msg00023.html )
-5: Gaetan Cambier says "add the option line to grub to disable ncq : 
'libata.force=noncq' for me, with this, i have no froze". ( 
https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it 
doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies that 
not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ).
-6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off ? 
For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did 
not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-7: A. Mani says "For the SB600 controller, the right thing to do is to 
restrict all drives to 1.5Gbps by jumpers or with a boot option."  Raman Gupta 
replies "I also tried this -- but with this setting all drives attached to my 
Marvell controller could not even be started by the kernel -- permanent 
"failed to IDENTIFY" errors." ( 
https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-8: DjznBR (djzn-br) says he have trying some things WITHOUT success it and 
finally one that works. Doesn't work: TURNED HDPARM OFF, CHANGED CABLE, 
EXPERIMENTED AHCI & RAID MODES, DISABLED NCQ, COMPILED KERNEL WITH 
CONFIG_SATA_PMP DISABLED, TRYING NOW LIBATA.FORCE=1.5GBPS, changed the cables 
to different routes... SATA1 -> SATA2 SATA2 -> SATA3 ---- Works (but still 
gives "softreset failed (device not ready)"  messages in dmesg and afterwards 
recover without data loss) :  Added option for kernel in grub configuration 
"libata.noacpi=1". Also says "libata.force=norst ... prevents soft and hard 
link resettings. If you have that switch on, when this bug comes up, there is 
a system lock down (because obviously the kernel prevented the soft & hard 
resetting." ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 )


Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable) 
with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb 
IDE disk (configured by debian as sda) in an old mainboard . No RAID used, but 
only soft reset, and no hard reset, so I don't lose data. Could send logs, but 
I think they wouldn't give any more info.

Same problem in my desktop PC every 2 or 3 months in Debian testing with 
kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64, 
2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two 
Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a 
RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb 
SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct 
raid. Could send logs, but I think they wouldn't give any more info.

Possibly these are the same bug: #539059, #603061, #524876

Same bug in other distros and kernels:
-Archlinux with udev-165 and udev-166: 
https://bbs.archlinux.org/viewtopic.php?pid=895404
-Fedora with kernel 2.6.38-0.rc8.git0.1.fc15.x86_64 and udev-166 in a DVD 
reader: https://bugzilla.redhat.com/show_bug.cgi?id=684599
-Fedora 13 with kernel 2.6.33.8-149.fc13.i686.PAE or Fedora 13 64bit on a Mac 
Mini
-Fedora 14 with kernels 2.6.31.6-166.fc12@x86_64, 2.6.32.11-99.fc12.x86_64, 
2.6.35.9-64.fc14.x86_64, 2.6.35.10-72.fc14.i686 and 2.6.35.10-74.fc14.x86_64 
and 2.6.35.11-83.fc14.x86_64 and 2.6.35.14-95.fc14.x86_64: 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Fedora 15 (updated from Fedora 14): 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Centos5.5-x64 with kernel 2.6.18-194-x64: 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-RHEL5 with vanilla kernel 2.6.37.3: 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Ubuntu since 8.10 64bit with kernels 2.6.27-7, 2.6.28-15-generic, 2.6.31-14-
generic, 2.6.31-15-generic (on a Macbook2), 2.6.38-7-generic (kernel-ppa): 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892
-Ubuntu 10.04: https://bugzilla.redhat.com/show_bug.cgi?id=549981


--
        Bye: Javier Ortega Conde (Malkavian)
________________________________________________________________________
The Malkavian's webpage: Many things	 http://malkavian.dyndns.org
Member of LinUxers Group from Bizkaia (GLUB)            http://glub.biz
Member of GoBi Go Club, Eghost, Itsas, Aske, Guardianes del Túmulo...
________________________________________________________________________
Microsoft is to operating systems and security what McDonald's to gourmet food 
and healthy nutrition. (Javier Ortega Conde (Malkavian))




Reply to: