[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Invalid GART PTE entry during table walk / ata4.00: failed command: WRITE DMA EXT



Hallo Debian Gemeinde,

ich hoffe jemand kann mir bei folgenden Problem einen Tipp geben.

Neuerdings verlieren einige meiner Server eine Festplatte im laufendem Betrieb. per fdisk -l kann ich keine Daten mehr auslesen. Ein Zugriff ist nicht mehr möglich. Ich denke das der Kernel einen S-ATA Port deaktiviert/reset durchführt.

Sehr interessant ist das meine /dev/sdb zu einer /dev/sdc im laufendem Betrieb wird.
lsscsi
[2:0:0:0]    disk    ATA      WDC WD7501AALS-7 05.0  /dev/sda
[3:0:0:0]    disk    ATA      WDC WD7501AALS-7 05.0  /dev/sdc

Also irgendetwas veranlasst den S-ATA port zu resetten oder was auch immer und führt dazu das die Platte aus dem System verschwindet und als neues Device /dev/sdc erkannt wird.

Hat jemand eine Idee wie man im laufendem Betrieb aus einer /dev/sdc wieder eine /dev/sdb macht?

Ich bin gespannt auf eure Ideen?

Nach einem Reboot laufen die Server eine Zeit lang gut.

Dann sehe ich in den logs oder auf der Console folgendes:

kernel:[187057.332113] Northbridge Error, node 0
kernel:[187057.332132] Invalid GART PTE entry during table walk.

Ich glaube nicht das die Festpaltten oder Kabel defekt sind, nach einen reboot und Test mit smart werden keine Fehler gemeldet.

Kurz bevor die Festplatte aus dem System geworfen wird, bzw. verschwindet werden folgende Meldungen geloogt. Ich denke das hier ein falscher Treiber verwendet wird oder eine Option im BIOS aus/ein geschaltet werden muss.

Jul 10 04:52:00 lvzs102b kernel: [5336760.973031]  Northbridge Error, node 0
Jul 10 04:52:00 lvzs102b kernel: [5336760.973057] Invalid GART PTE entry during table walk.

Jul 15 04:49:56 lvzs102b kernel: [192165.804058] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 15 04:49:56 lvzs102b kernel: [192165.804087] ata4.00: failed command: WRITE DMA EXT
Jul 15 04:49:56 lvzs102b kernel: [192165.804101] ata4.00: cmd 35/00:00:2e:d4:c7/00:04:4d:00:00/e0 tag 0 dma 524288 out
Jul 15 04:49:56 lvzs102b kernel: [192165.804104]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 15 04:49:56 lvzs102b kernel: [192165.804124] ata4.00: status: { DRDY }
Jul 15 04:49:56 lvzs102b kernel: [192165.804137] ata4: hard resetting link
Jul 15 04:49:56 lvzs102b kernel: [192165.804140] ata4: nv: skipping hardreset on occupied port
Jul 15 04:50:02 lvzs102b kernel: [192171.313020] ata4: link is slow to respond, please be patient (ready=0)
Jul 15 04:50:06 lvzs102b kernel: [192175.849021] ata4: SRST failed (errno=-16)
Jul 15 04:50:06 lvzs102b kernel: [192175.849039] ata4: hard resetting link
Jul 15 04:50:06 lvzs102b kernel: [192175.849044] ata4: nv: skipping hardreset on occupied port
Jul 15 04:50:12 lvzs102b kernel: [192181.357022] ata4: link is slow to respond, please be patient (ready=0)
Jul 15 04:50:16 lvzs102b kernel: [192185.893018] ata4: SRST failed (errno=-16)
Jul 15 04:50:16 lvzs102b kernel: [192185.893035] ata4: hard resetting link
Jul 15 04:50:16 lvzs102b kernel: [192185.893039] ata4: nv: skipping hardreset on occupied port
Jul 15 04:50:22 lvzs102b kernel: [192191.401053] ata4: link is slow to respond, please be patient (ready=0)
Jul 15 04:50:51 lvzs102b kernel: [192220.913023] ata4: SRST failed (errno=-16)
Jul 15 04:50:51 lvzs102b kernel: [192220.913040] ata4: limiting SATA link speed to 1.5 Gbps
Jul 15 04:50:51 lvzs102b kernel: [192220.913046] ata4: hard resetting link
Jul 15 04:50:51 lvzs102b kernel: [192220.913050] ata4: nv: skipping hardreset on occupied port
Jul 15 04:50:56 lvzs102b kernel: [192225.916021] ata4: SRST failed (errno=-16)
Jul 15 04:50:56 lvzs102b kernel: [192225.916039] ata4: reset failed, giving up
Jul 15 04:50:56 lvzs102b kernel: [192225.916048] ata4.00: disabled
Jul 15 04:50:56 lvzs102b kernel: [192225.916071] ata4.00: device reported invalid CHS sector 0
Jul 15 04:50:56 lvzs102b kernel: [192225.916086] ata4: EH complete
Jul 15 04:50:56 lvzs102b kernel: [192225.916121] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.916125] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.916130] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 4d c7 d4 2e 00 04 00 00
Jul 15 04:50:56 lvzs102b kernel: [192225.916139] end_request: I/O error, dev sdb, sector 1304941614
Jul 15 04:50:56 lvzs102b kernel: [192225.916152] raid1: Disk failure on sdb6, disabling device.
Jul 15 04:50:56 lvzs102b kernel: [192225.916154] raid1: Operation continuing on 1 devices.
Jul 15 04:50:56 lvzs102b kernel: [192225.916488] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.916491] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.916496] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 4d c7 d8 2e 00 04 00 00
Jul 15 04:50:56 lvzs102b kernel: [192225.916504] end_request: I/O error, dev sdb, sector 1304942638
Jul 15 04:50:56 lvzs102b kernel: [192225.916798] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.916801] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.916805] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 4d c7 dc 2e 00 04 00 00
Jul 15 04:50:56 lvzs102b kernel: [192225.916813] end_request: I/O error, dev sdb, sector 1304943662
Jul 15 04:50:56 lvzs102b kernel: [192225.917085] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.917088] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.917093] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 4d c7 e0 2e 00 04 00 00
Jul 15 04:50:56 lvzs102b kernel: [192225.917100] end_request: I/O error, dev sdb, sector 1304944686
Jul 15 04:50:56 lvzs102b kernel: [192225.917374] end_request: I/O error, dev sdb, sector 2048
Jul 15 04:50:56 lvzs102b kernel: [192225.917427] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.917432] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.917438] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 43 a5 7f 8e 00 02 00 00
Jul 15 04:50:56 lvzs102b kernel: [192225.917447] end_request: I/O error, dev sdb, sector 1134919566
Jul 15 04:50:56 lvzs102b kernel: [192225.917463] raid1: sdb6: rescheduling sector 389774560
Jul 15 04:50:56 lvzs102b kernel: [192225.917474] raid1: sdb6: rescheduling sector 389774568
Jul 15 04:50:56 lvzs102b kernel: [192225.917484] raid1: sdb6: rescheduling sector 389774576
Jul 15 04:50:56 lvzs102b kernel: [192225.917494] raid1: sdb6: rescheduling sector 389774584
Jul 15 04:50:56 lvzs102b kernel: [192225.917504] raid1: sdb6: rescheduling sector 389774592
Jul 15 04:50:56 lvzs102b kernel: [192225.917514] raid1: sdb6: rescheduling sector 389774600
Jul 15 04:50:56 lvzs102b kernel: [192225.917524] raid1: sdb6: rescheduling sector 389774608
Jul 15 04:50:56 lvzs102b kernel: [192225.917798] raid1: sdb6: rescheduling sector 389774616
Jul 15 04:50:56 lvzs102b kernel: [192225.918059] raid1: sdb6: rescheduling sector 389774624
Jul 15 04:50:56 lvzs102b kernel: [192225.918509] raid1: sdb6: rescheduling sector 389774632
Jul 15 04:50:56 lvzs102b kernel: [192225.919472] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.919475] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.919480] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 4c 92 fe 66 00 00 10 00
Jul 15 04:50:56 lvzs102b kernel: [192225.919488] end_request: I/O error, dev sdb, sector 1284701798
Jul 15 04:50:56 lvzs102b kernel: [192225.920370] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.920373] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.920377] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 01 7f a2 24 00 00 02 00
Jul 15 04:50:56 lvzs102b kernel: [192225.920385] end_request: I/O error, dev sdb, sector 25141796
Jul 15 04:50:56 lvzs102b kernel: [192225.921267] end_request: I/O error, dev sdb, sector 25141796
Jul 15 04:50:56 lvzs102b kernel: [192225.922131] md: super_written gets error=-5, uptodate=0
Jul 15 04:50:56 lvzs102b kernel: [192225.922136] raid1: Disk failure on sdb5, disabling device.
Jul 15 04:50:56 lvzs102b kernel: [192225.922139] raid1: Operation continuing on 1 devices.
Jul 15 04:50:56 lvzs102b kernel: [192225.923882] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.923885] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.923890] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 01 1f c9 1e 00 00 02 00
Jul 15 04:50:56 lvzs102b kernel: [192225.923897] end_request: I/O error, dev sdb, sector 18860318
Jul 15 04:50:56 lvzs102b kernel: [192225.924739] end_request: I/O error, dev sdb, sector 18860318
Jul 15 04:50:56 lvzs102b kernel: [192225.925255] md: super_written gets error=-5, uptodate=0
Jul 15 04:50:56 lvzs102b kernel: [192225.925255] raid1: Disk failure on sdb3, disabling device.
Jul 15 04:50:56 lvzs102b kernel: [192225.925255] raid1: Operation continuing on 1 devices.
Jul 15 04:50:56 lvzs102b kernel: [192225.927217] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.927221] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.927226] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 01 20 10 ce 00 00 08 00
Jul 15 04:50:56 lvzs102b kernel: [192225.927233] end_request: I/O error, dev sdb, sector 18878670
Jul 15 04:50:56 lvzs102b kernel: [192225.932032] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.932035] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.932040] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 4c 4b 9e ae 00 00 10 00
Jul 15 04:50:56 lvzs102b kernel: [192225.932048] end_request: I/O error, dev sdb, sector 1280024238
Jul 15 04:50:56 lvzs102b kernel: [192225.932312] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.932315] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.932320] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 4c 4b 9e ce 00 00 18 00
Jul 15 04:50:56 lvzs102b kernel: [192225.932327] end_request: I/O error, dev sdb, sector 1280024270
Jul 15 04:50:56 lvzs102b kernel: [192225.941034] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:56 lvzs102b kernel: [192225.941037] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:56 lvzs102b kernel: [192225.941042] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 41 b5 83 e5 00 00 2f 00
Jul 15 04:50:56 lvzs102b kernel: [192225.941050] end_request: I/O error, dev sdb, sector 1102414821
Jul 15 04:50:57 lvzs102b kernel: [192226.009162] RAID1 conf printout:
Jul 15 04:50:57 lvzs102b kernel: [192226.009165]  --- wd:1 rd:2
Jul 15 04:50:57 lvzs102b kernel: [192226.009168]  disk 0, wo:0, o:1, dev:sda6
Jul 15 04:50:57 lvzs102b kernel: [192226.009172]  disk 1, wo:1, o:0, dev:sdb6
Jul 15 04:50:57 lvzs102b kernel: [192226.033677] RAID1 conf printout:
Jul 15 04:50:57 lvzs102b kernel: [192226.033680]  --- wd:1 rd:2
Jul 15 04:50:57 lvzs102b kernel: [192226.033683]  disk 0, wo:1, o:0, dev:sdb5
Jul 15 04:50:57 lvzs102b kernel: [192226.033686]  disk 1, wo:0, o:1, dev:sda5
Jul 15 04:50:57 lvzs102b kernel: [192226.041017] RAID1 conf printout:
Jul 15 04:50:57 lvzs102b kernel: [192226.041020]  --- wd:1 rd:2
Jul 15 04:50:57 lvzs102b kernel: [192226.041023]  disk 0, wo:0, o:1, dev:sda6
Jul 15 04:50:57 lvzs102b kernel: [192226.044419] RAID1 conf printout:
Jul 15 04:50:57 lvzs102b kernel: [192226.044422]  --- wd:1 rd:2
Jul 15 04:50:57 lvzs102b kernel: [192226.044425]  disk 0, wo:0, o:1, dev:sda3
Jul 15 04:50:57 lvzs102b kernel: [192226.044428]  disk 1, wo:1, o:0, dev:sdb3
Jul 15 04:50:57 lvzs102b kernel: [192226.053018] RAID1 conf printout:
Jul 15 04:50:57 lvzs102b kernel: [192226.053021]  --- wd:1 rd:2
Jul 15 04:50:57 lvzs102b kernel: [192226.053024]  disk 1, wo:0, o:1, dev:sda5
Jul 15 04:50:57 lvzs102b kernel: [192226.069051] RAID1 conf printout:
Jul 15 04:50:57 lvzs102b kernel: [192226.069054]  --- wd:1 rd:2
Jul 15 04:50:57 lvzs102b kernel: [192226.069057]  disk 0, wo:0, o:1, dev:sda3
Jul 15 04:50:57 lvzs102b kernel: [192226.069367] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:57 lvzs102b kernel: [192226.069370] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:57 lvzs102b kernel: [192226.069375] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 00 03 2e 87 00 00 18 00
Jul 15 04:50:57 lvzs102b kernel: [192226.069384] end_request: I/O error, dev sdb, sector 208519
Jul 15 04:50:57 lvzs102b kernel: [192226.142709] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:57 lvzs102b kernel: [192226.142715] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:57 lvzs102b kernel: [192226.142721] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 00 03 2e 87 00 00 08 00
Jul 15 04:50:57 lvzs102b kernel: [192226.142730] end_request: I/O error, dev sdb, sector 208519
Jul 15 04:50:57 lvzs102b kernel: [192226.212994] sd 3:0:0:0: [sdb] Unhandled error code
Jul 15 04:50:57 lvzs102b kernel: [192226.213022] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 15 04:50:57 lvzs102b kernel: [192226.213028] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 00 03 2e 87 00 00 08 00
Jul 15 04:50:57 lvzs102b kernel: [192226.213037] end_request: I/O error, dev sdb, sector 208519
Jul 15 04:50:57 lvzs102b kernel: [192226.213398] raid1: Disk failure on sdb1, disabling device.
Jul 15 04:50:57 lvzs102b kernel: [192226.213400] raid1: Operation continuing on 1 devices.
Jul 15 04:50:58 lvzs102b kernel: [192227.119401] RAID1 conf printout:
Jul 15 04:50:58 lvzs102b kernel: [192227.119406]  --- wd:1 rd:2
Jul 15 04:50:58 lvzs102b kernel: [192227.119411]  disk 0, wo:0, o:1, dev:sda1
Jul 15 04:50:58 lvzs102b kernel: [192227.119415]  disk 1, wo:1, o:0, dev:sdb1
Jul 15 04:50:58 lvzs102b kernel: [192227.129018] RAID1 conf printout:
Jul 15 04:50:58 lvzs102b kernel: [192227.129021]  --- wd:1 rd:2
Jul 15 04:50:58 lvzs102b kernel: [192227.129024]  disk 0, wo:0, o:1, dev:sda1
Jul 15 04:51:02 lvzs102b kernel: [192231.247866] ata4: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
Jul 15 04:51:02 lvzs102b kernel: [192231.248147] ata4: SError: { PHYRdyChg LinkSeq TrStaTrns }
Jul 15 04:51:02 lvzs102b kernel: [192231.248393] ata4: hard resetting link
Jul 15 04:51:07 lvzs102b kernel: [192236.833289] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 15 04:51:07 lvzs102b kernel: [192236.874589] ata4.00: ATA-8: WDC WD7501AALS-75J7B0, 05.00K05, max UDMA/133
Jul 15 04:51:07 lvzs102b kernel: [192236.874594] ata4.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 0/32)
Jul 15 04:51:07 lvzs102b kernel: [192236.890099] ata4.00: configured for UDMA/133
Jul 15 04:51:07 lvzs102b kernel: [192236.890115] ata4: EH complete
Jul 15 04:51:07 lvzs102b kernel: [192236.890126] ata4.00: detaching (SCSI 3:0:0:0)
Jul 15 04:51:07 lvzs102b kernel: [192236.904181] sd 3:0:0:0: [sdb] Synchronizing SCSI cache
Jul 15 04:51:07 lvzs102b kernel: [192236.904425] sd 3:0:0:0: [sdb] Stopping disk
Jul 15 04:51:08 lvzs102b kernel: [192237.326797] scsi 3:0:0:0: Direct-Access     ATA      WDC WD7501AALS-7 05.0 PQ: 0 ANSI: 5
Jul 15 04:51:08 lvzs102b kernel: [192237.333260] sd 3:0:0:0: [sdc] 1465149168 512-byte logical blocks: (750 GB/698 GiB)
Jul 15 04:51:08 lvzs102b kernel: [192237.333314] sd 3:0:0:0: [sdc] Write Protect is off
Jul 15 04:51:08 lvzs102b kernel: [192237.333319] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jul 15 04:51:08 lvzs102b kernel: [192237.333342] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 15 04:51:09 lvzs102b kernel: [192237.333692]  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 >
Jul 15 04:51:09 lvzs102b kernel: [192238.953443] sd 3:0:0:0: [sdc] Attached SCSI disk



Reply to: