On 29/05/2015 5:08 PM, "Petter Adsen" <petter@synth.no> wrote:
>
> When I woke up this morning, one of my boxen had spewed out a ton of
> errors from one of my SSDs (the root drive), remounted read-only, and
> went into a kernel panic.
>
> After rebooting everything seems fine, though. I've ran a SMART long
> test, but as I found out the SMART error log is not supported on this
> drive. Neither do I have the log of what happened, since / was
> remounted ro.
>
> I've included the output of "smartctl --all /dev/sdc", but I can't see
> anything that stands out.
>
> Yesterday, I had another kernel panic (that seemed related to systemd),
> so I suspect the (manually built) kernel to be at fault here. The RAM
> in this machine is all brand new, and I ran memtest less than two weeks
> ago, so that should be fine.
>
> Can anyone look at this log and tell me if there is anything to worry
> about? Which of the attributes should I look at, so that I know in the
> future?
>
> (And I did a full backup as recently as yesterday that was tested OK
> at the time, so data loss is not a concern. Everything important is on
> other drives anyway.)
>
> ---<snip>---
> smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.19.0-18-generic] (local build)
> Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family: SandForce Driven SSDs
> Device Model: KINGSTON SV300S37A120G
> Serial Number: <snip>
> LU WWN Device Id: 5 0026b7 74703dbf1
> Firmware Version: 525ABBF0
> User Capacity: 120 034 123 776 bytes [120 GB]
> Sector Size: 512 bytes logical/physical
> Rotation Rate: Solid State Device
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3
> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is: Fri May 29 08:50:31 2015 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x02) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection: Disabled.
> Self-test execution status: ( 0) The previous self-test routine completed
> without error or no self-test has ever
> been run.
> Total time to complete Offline
> data collection: ( 0) seconds.
> Offline data collection
> capabilities: (0x79) SMART execute Offline immediate.
> No Auto Offline data collection support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 36) minutes.
> Conveyance self-test routine
> recommended polling time: ( 2) minutes.
> SCT capabilities: (0x0025) SCT Status supported.
> SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x0033 095 095 050 Pre-fail Always - 0/6132927
> 5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
> 9 Power_On_Hours_and_Msec 0x0032 096 096 000 Old_age Always - 4237h+54m+09.420s
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 74
> 171 Program_Fail_Count 0x000a 000 000 000 Old_age Always - 0
> 172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
> 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 65
> 177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 0
> 181 Program_Fail_Count 0x000a 000 000 000 Old_age Always - 0
> 182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
> 187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 0
> 189 Airflow_Temperature_Cel 0x0000 024 036 000 Old_age Offline - 24 (Min/Max 15/36)
> 194 Temperature_Celsius 0x0022 024 036 000 Old_age Always - 24 (Min/Max 15/36)
> 195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/6132927
> 196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
Reallocated_Event_Count is 0 meaning no bad sectors were ever found. I have a failing drive atm and this number slowly piles up.
> 201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/6132927
> 204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/6132927
> 230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100
> 231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
> 233 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2063
> 234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2767
> 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2767
> 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 2177
>
> SMART Error Log not supported
>
> SMART Self-test Log not supported
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
> ---<snip>---
>
> Petter
>
> --
> "I'm ionized"
> "Are you sure?"
> "I'm positive."