[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#625922: SATA devices get reset without real hardware failure



Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:
This bug (in general, not just this on this web) have been in GNU/Linux since
a long time with various disks, mainboards, SATA controllers, distros and
kernels (maybe since changes after 2.6.24).

I'm using kernel 2.6.37.6 and there this bug is still present.
Has it been fixed in any recent kernel versions?
IMO it deserves the highest priority to fix this ASAP.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892
"
Raj B (bigwoof) wrote on 2011-01-03:
...
I've lost data because of this as well. my entire /var/lib/mysql directory
was blown away and recovered into lost+found. other directories are there as well.
...
"

I had a similar disaster yesterday... :-(

From my syslog:

Oct 18 12:11:16 c12 kernel: [   35.340954] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   35.342052] ata1.00: irq_stat 0x40000001
Oct 18 12:11:16 c12 kernel: [   35.343141] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   35.344230] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   35.344232]          res 51/01:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x1 (device error)
Oct 18 12:11:16 c12 kernel: [   35.346497] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   35.351588] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   35.352760] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   36.374319] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   36.375516] ata1.00: irq_stat 0x40000001
Oct 18 12:11:16 c12 kernel: [   36.376722] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   36.377913] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   36.377915]          res 51/01:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x1 (device error)
Oct 18 12:11:16 c12 kernel: [   36.380393] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   36.385574] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   36.386828] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   37.407698] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   37.409013] ata1.00: irq_stat 0x40000001
Oct 18 12:11:16 c12 kernel: [   37.410317] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   37.411638] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   37.411639]          res 51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error)
Oct 18 12:11:16 c12 kernel: [   37.414381] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   37.415758] ata1.00: error: { UNC }
Oct 18 12:11:16 c12 kernel: [   37.421076] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   37.422466] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   38.449412] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   38.450847] ata1.00: irq_stat 0x40000000
Oct 18 12:11:16 c12 kernel: [   38.452285] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   38.453718] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   38.453720]          res 51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error)
Oct 18 12:11:16 c12 kernel: [   38.456694] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   38.458190] ata1.00: error: { UNC }
Oct 18 12:11:16 c12 kernel: [   38.463615] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   38.465135] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   39.491124] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   39.492692] ata1.00: irq_stat 0x40000001
Oct 18 12:11:16 c12 kernel: [   39.494253] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   39.495829] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   39.495831]          res 51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error)
Oct 18 12:11:16 c12 kernel: [   39.499081] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   39.500710] ata1.00: error: { UNC }
Oct 18 12:11:16 c12 kernel: [   39.506254] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   39.507867] ata1: EH complete

...

Oct 18 14:05:41 c12 kernel: [   71.786231] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen
Oct 18 14:05:41 c12 kernel: [   71.786236] ata1.00: irq_stat 0x08400000, interface fatal error, PHY RDY changed
Oct 18 14:05:41 c12 kernel: [   71.786240] ata1: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
Oct 18 14:05:41 c12 kernel: [   71.786243] ata1.00: failed command: READ DMA
Oct 18 14:05:41 c12 kernel: [   71.786250] ata1.00: cmd c8/00:20:27:03:9d/00:00:00:00:00/e1 tag 0 dma 16384 in
Oct 18 14:05:41 c12 kernel: [   71.786251]          res 50/00:00:c6:02:9d/00:00:00:00:00/e1 Emask 0x10 (ATA bus error)
Oct 18 14:05:41 c12 kernel: [   71.786254] ata1.00: status: { DRDY }
Oct 18 14:05:41 c12 kernel: [   71.786260] ata1: hard resetting link
Oct 18 14:05:44 c12 kernel: [   74.524018] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 18 14:05:44 c12 kernel: [   74.527866] ata1.00: configured for UDMA/133
Oct 18 14:05:44 c12 kernel: [   74.527874] ata1: EH complete
Oct 18 14:05:44 c12 kernel: [   74.533628] ata1: limiting SATA link speed to 1.5 Gbps
Oct 18 14:05:44 c12 kernel: [   74.533633] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
Oct 18 14:05:44 c12 kernel: [   74.533636] ata1.00: irq_stat 0x08400000, interface fatal error, PHY RDY changed
Oct 18 14:05:44 c12 kernel: [   74.533639] ata1: SError: { PHYRdyChg LinkSeq TrStaTrns }
Oct 18 14:05:44 c12 kernel: [   74.533642] ata1.00: failed command: READ DMA
Oct 18 14:05:44 c12 kernel: [   74.533648] ata1.00: cmd c8/00:38:47:03:9d/00:00:00:00:00/e1 tag 0 dma 28672 in
Oct 18 14:05:44 c12 kernel: [   74.533650]          res 50/00:00:f6:e8:12/00:00:00:00:00/e2 Emask 0x10 (ATA bus error)
Oct 18 14:05:44 c12 kernel: [   74.533653] ata1.00: status: { DRDY }
Oct 18 14:05:44 c12 kernel: [   74.533658] ata1: hard resetting link
Oct 18 14:05:46 c12 kernel: [   77.272018] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 18 14:05:46 c12 kernel: [   77.275862] ata1.00: configured for UDMA/133
Oct 18 14:05:46 c12 kernel: [   77.275871] ata1: EH complete
Oct 18 14:05:46 c12 kernel: [   77.280770] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen
Oct 18 14:05:46 c12 kernel: [   77.280773] ata1.00: irq_stat 0x00400000, PHY RDY changed
Oct 18 14:05:46 c12 kernel: [   77.280776] ata1: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
Oct 18 14:05:46 c12 kernel: [   77.280779] ata1.00: failed command: READ DMA
Oct 18 14:05:46 c12 kernel: [   77.280785] ata1.00: cmd c8/00:38:47:03:9d/00:00:00:00:00/e1 tag 0 dma 28672 in
Oct 18 14:05:46 c12 kernel: [   77.280787]          res 50/00:00:af:9e:a1/00:00:12:00:00/e0 Emask 0x10 (ATA bus error)
Oct 18 14:05:46 c12 kernel: [   77.280790] ata1.00: status: { DRDY }
Oct 18 14:05:46 c12 kernel: [   77.280794] ata1: hard resetting link
Oct 18 14:05:49 c12 kernel: [   80.020014] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 18 14:05:49 c12 kernel: [   80.023855] ata1.00: configured for UDMA/133
Oct 18 14:05:49 c12 kernel: [   80.023860] ata1: EH complete
Oct 18 14:05:49 c12 kernel: [   80.032760] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen
Oct 18 14:05:49 c12 kernel: [   80.032762] ata1.00: irq_stat 0x00400000, PHY RDY changed
Oct 18 14:05:49 c12 kernel: [   80.032765] ata1: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
Oct 18 14:05:49 c12 kernel: [   80.032768] ata1.00: failed command: READ DMA
Oct 18 14:05:49 c12 kernel: [   80.032774] ata1.00: cmd c8/00:38:47:03:9d/00:00:00:00:00/e1 tag 0 dma 28672 in
Oct 18 14:05:49 c12 kernel: [   80.032775]          res 50/00:00:af:9e:a1/00:00:12:00:00/e0 Emask 0x10 (ATA bus error)
Oct 18 14:05:49 c12 kernel: [   80.032778] ata1.00: status: { DRDY }
Oct 18 14:05:49 c12 kernel: [   80.032782] ata1: hard resetting link
Oct 18 14:05:52 c12 kernel: [   82.772016] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 18 14:05:52 c12 kernel: [   82.775858] ata1.00: configured for UDMA/133
Oct 18 14:05:52 c12 kernel: [   82.775864] ata1: EH complete


Such essential components like the HD driver here must be fixed ASAP.

And I wonder why ext3's journalling functionality doesn't help here,
what good is it for then when it garbages the filesystem by using the
same inode for more than one file etc etc?...
Unbelieable idiotic behaviour and situation...


Reply to: