[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: zfs pool degraded



Ciao Piviul,
sembra il caso di cambiarlo, magari qualcuno con più esperienza potrebbe confermare la diagnosi.


Il 12/06/20 14:44, Piviul ha scritto:
Ciao Alessandro...

Alessandro Baggi ha scritto il 12/06/20 alle 11:09:
[...]
I valori THRESH dello smart del disco riportano qualcosa di strano?
Non riesco a leggerli né da iLO4 né da smartctl, non so infatti dove trovare le informazioni da inserire nell'opzione --device di smartctl :(

I log di sistema hanno riportato qualcosa al riguardo del disco come errore I/O ecc (anche dmesg al momento dell'errore)?
No, non ci ho guardato ma l'ho fatto ora; ecco cosa riportano:
Jun 11 08:23:34 pve02 kernel: [63208.585107] sd 2:0:2:0: [sdc] tag#128 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 11 08:23:34 pve02 kernel: [63208.585126] sd 2:0:2:0: [sdc] tag#128 Sense Key : Illegal Request [current] Jun 11 08:23:34 pve02 kernel: [63208.585132] sd 2:0:2:0: [sdc] tag#128 Add. Sense: Logical block address out of range Jun 11 08:23:34 pve02 kernel: [63208.585139] sd 2:0:2:0: [sdc] tag#128 CDB: Write(10) 2a 00 b7 e8 13 10 00 05 28 00 Jun 11 08:23:34 pve02 kernel: [63208.585146] blk_update_request: critical target error, dev sdc, sector 3085439760 op 0x1:(WRITE) flags 0x700 phys_seg 12 prio class 0 Jun 11 08:23:34 pve02 kernel: [63208.585252] zio pool=zfspool vdev=/dev/sdc1 error=121 type=2 offset=1579744108544 size=675840 flags=40080c80 Jun 11 08:23:48 pve02 kernel: [63223.328731] sd 2:0:2:0: [sdc] tag#162 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 11 08:23:48 pve02 kernel: [63223.328751] sd 2:0:2:0: [sdc] tag#162 Sense Key : Illegal Request [current] Jun 11 08:23:48 pve02 kernel: [63223.328757] sd 2:0:2:0: [sdc] tag#162 Add. Sense: Logical block address out of range Jun 11 08:23:48 pve02 kernel: [63223.328764] sd 2:0:2:0: [sdc] tag#162 CDB: Write(10) 2a 00 c3 39 fa 18 00 02 60 00 Jun 11 08:23:48 pve02 kernel: [63223.328771] blk_update_request: critical target error, dev sdc, sector 3275356696 op 0x1:(WRITE) flags 0x700 phys_seg 6 prio class 0 Jun 11 08:23:48 pve02 kernel: [63223.328878] zio pool=zfspool vdev=/dev/sdc1 error=121 type=2 offset=1676981579776 size=311296 flags=40080c80 Jun 11 08:24:28 pve02 kernel: [63263.323315] sd 2:0:2:0: [sdc] tag#141 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 11 08:24:28 pve02 kernel: [63263.323334] sd 2:0:2:0: [sdc] tag#141 Sense Key : Medium Error [current] Jun 11 08:24:28 pve02 kernel: [63263.323340] sd 2:0:2:0: [sdc] tag#141 Add. Sense: Unrecovered read error Jun 11 08:24:28 pve02 kernel: [63263.323347] sd 2:0:2:0: [sdc] tag#141 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 08:24:28 pve02 kernel: [63263.323353] blk_update_request: critical medium error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 08:25:37 pve02 kernel: [63331.416500] sd 2:0:2:0: [sdc] tag#143 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 11 08:25:37 pve02 kernel: [63331.416507] sd 2:0:2:0: [sdc] tag#143 Sense Key : Medium Error [current] Jun 11 08:25:37 pve02 kernel: [63331.416510] sd 2:0:2:0: [sdc] tag#143 Add. Sense: Unrecovered read error Jun 11 08:25:37 pve02 kernel: [63331.416516] sd 2:0:2:0: [sdc] tag#143 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 08:25:37 pve02 kernel: [63331.416522] blk_update_request: critical medium error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.560537] sd 2:0:2:0: [sdc] Unaligned partial completion (resid=4056, sector_sz=512) Jun 11 11:44:58 pve02 kernel: [75292.560549] sd 2:0:2:0: [sdc] tag#769 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.560558] sd 2:0:2:0: [sdc] tag#769 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK Jun 11 11:44:58 pve02 kernel: [75292.560562] sd 2:0:2:0: [sdc] tag#769 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.560567] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.560694] sd 2:0:2:0: [sdc] tag#770 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK Jun 11 11:44:58 pve02 kernel: [75292.560697] sd 2:0:2:0: [sdc] tag#770 CDB: Read(10) 28 00 00 00 08 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.560700] blk_update_request: I/O error, dev sdc, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.688448] sd 2:0:2:0: [sdc] tag#799 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_SENSE Jun 11 11:44:58 pve02 kernel: [75292.688456] sd 2:0:2:0: [sdc] tag#799 Sense Key : Illegal Request [current] Jun 11 11:44:58 pve02 kernel: [75292.688460] sd 2:0:2:0: [sdc] tag#799 Add. Sense: Logical unit not supported Jun 11 11:44:58 pve02 kernel: [75292.688466] sd 2:0:2:0: [sdc] tag#799 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.688471] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.690077] sd 2:0:2:0: [sdc] tag#800 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 11 11:44:58 pve02 kernel: [75292.690081] sd 2:0:2:0: [sdc] tag#800 CDB: Read(10) 28 00 00 00 08 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.690085] blk_update_request: I/O error, dev sdc, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.743075] sd 2:0:2:0: [sdc] tag#826 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 11 11:44:58 pve02 kernel: [75292.743081] sd 2:0:2:0: [sdc] tag#826 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.743085] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.744109] sd 2:0:2:0: [sdc] tag#828 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 11 11:44:58 pve02 kernel: [75292.744116] sd 2:0:2:0: [sdc] tag#828 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.744120] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:58 pve02 kernel: [75292.744644] sd 2:0:2:0: [sdc] tag#831 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 11 11:44:58 pve02 kernel: [75292.744650] sd 2:0:2:0: [sdc] tag#831 CDB: Read(10) 28 00 00 00 08 00 00 01 00 00 Jun 11 11:44:58 pve02 kernel: [75292.744655] blk_update_request: I/O error, dev sdc, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:59 pve02 kernel: [75292.937689] sd 2:0:2:0: [sdc] tag#817 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 11 11:44:59 pve02 kernel: [75292.937696] sd 2:0:2:0: [sdc] tag#817 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00 Jun 11 11:44:59 pve02 kernel: [75292.937701] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 11:44:59 pve02 kernel: [75292.939169] sd 2:0:2:0: [sdc] tag#818 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 11 11:44:59 pve02 kernel: [75292.939173] sd 2:0:2:0: [sdc] tag#818 CDB: Read(10) 28 00 00 00 08 00 00 01 00 00 Jun 11 11:44:59 pve02 kernel: [75292.939179] blk_update_request: I/O error, dev sdc, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Jun 11 12:37:23 pve02 kernel: [78436.735452] sd 2:0:5:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB) Jun 11 12:37:23 pve02 kernel: [78436.736724] sd 2:0:5:0: [sdc] Write Protect is off Jun 11 12:37:23 pve02 kernel: [78436.736729] sd 2:0:5:0: [sdc] Mode Sense: 46 00 10 08 Jun 11 12:37:23 pve02 kernel: [78436.739187] sd 2:0:5:0: [sdc] Write cache: disabled, read cache: enabled, supports DPO and FUA
Jun 11 12:37:23 pve02 kernel: [78436.809893]  sdc: sdc1
sembrano proprio degli errori sul disco...

in effetti sono riuscito a leggere i dati smart e ci sono parecchi errori:
# smartctl -l error -d cciss,1 /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.41-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 123 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 123 occurred at disk power-on lifetime: 59475 hours (2478 days + 3 hours)   When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 d0 00 00 00 00      20:29:24.713  READ LOG EXT
  2f 00 01 d0 00 00 00 00      20:29:24.702  READ LOG EXT
  2f 00 01 d0 00 00 00 00      20:29:24.692  READ LOG EXT
  ec 00 00 00 00 00 00 00      20:29:24.691  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:29:24.690  IDENTIFY DEVICE

Error 122 occurred at disk power-on lifetime: 59475 hours (2478 days + 3 hours)   When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 d0 00 00 00 00      20:29:24.702  READ LOG EXT
  2f 00 01 d0 00 00 00 00      20:29:24.692  READ LOG EXT
  ec 00 00 00 00 00 00 00      20:29:24.691  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:29:24.690  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:29:24.689  IDENTIFY DEVICE

Error 121 occurred at disk power-on lifetime: 59475 hours (2478 days + 3 hours)   When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 d0 00 00 00 00      20:29:24.692  READ LOG EXT
  ec 00 00 00 00 00 00 00      20:29:24.691  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:29:24.690  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:29:24.689  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:29:24.688  IDENTIFY DEVICE

Error 120 occurred at disk power-on lifetime: 59474 hours (2478 days + 2 hours)   When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 d0 00 00 00 00      20:13:40.125  READ LOG EXT
  2f 00 01 d0 00 00 00 00      20:13:40.113  READ LOG EXT
  2f 00 01 d0 00 00 00 00      20:13:40.104  READ LOG EXT
  ec 00 00 00 00 00 00 00      20:13:40.103  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:13:40.102  IDENTIFY DEVICE

Error 119 occurred at disk power-on lifetime: 59474 hours (2478 days + 2 hours)   When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 d0 00 00 00 00      20:13:40.113  READ LOG EXT
  2f 00 01 d0 00 00 00 00      20:13:40.104  READ LOG EXT
  ec 00 00 00 00 00 00 00      20:13:40.103  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:13:40.102  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      20:13:40.101  IDENTIFY DEVICE
...adesso sto facendo un po' di self-test smart ma temo che dovrò proprio cambiarlo...

Mille grazie!

Piviul



Reply to: