[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bad blocks and powernowd



2009/1/16 Johannes Wiedersich <johannes@physik.blm.tu-muenchen.de>:
> Davide Mancusi wrote:
>> The hard disk of my 4-year-old laptop is starting to fail. I ran
>> fsck.ext3 -c on my root partition yesterday and a few blocks were
>> marked as damaged. The blocks contained some XFCE4 theme files, so I
>> thought that reinstalling the relevant package should be enough. Now,
>> however, the machine hangs every time I start powernowd. Kernel
>> emergency key presses (Alt+SysRq+?) don't work and the usual log files
>> don't contain any relevant information. I have tried uninstalling and
>> reinstalling the powernowd package, but it didn't help; note also that
>> fsck did not signal any damaged files belonging to powernowd.
>>
>> Can anyone help me sort this out? Could it be that fsck -c did not
>> mark some blocks as damaged because I ran it with the root partition
>> mounted read-only (as opposed to unmounted)?
>
> If your disk is dying this could mean about anything.
>
> Try smartctl from smartmontools package. What does it report about the
> health status of your disk (after some testing)?
>
> Try e2fsck again to see, if it detects 'new' errors on your file system.
>
> I hope you have good back ups. You could try diff -r against your backup
> (mounted ro). However, if your disk is damaged and loads and runs
> garbled kernel stuff, you risk hosing your backup. Therefore it might be
> safer to investigate by booting a rescue system from CD or usb-disk. YMMV.

Thanks for your response, Johannes.

Now I'm confused. I installed smartmontools, I ran
# smartctl -t long /dev/hda
and I detected two bad sectors. I followed the HOWTO at [1] and
reallocated the first one. (I had no idea one could recover bad
sectors. I thought they were as good as gone.) Then I ran the test
again to get the LBA address of the second bad block. Surprise,
surprise, the test completed without problems.

I also tried booting off a live CD and running e2fsck -c -c on all
ext2/3 partitions. No bad blocks were detected, but one of the inode
tables was heavily modified. However, even though no files related to
powernowd were touched, powernowd now works again.

>From the live CD I ran again
# smartctl -t long /dev/hda
[waited one hour]
# smartctl -l selftest /dev/hda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      6264         -
# 2  Extended offline    Completed without error       00%      6262         -
# 3  Extended offline    Completed without error       00%      6259         -
# 4  Short offline       Completed without error       00%      6258         -
# 5  Extended offline    Completed: read failure       30%      6257
      95245863

You can see that the last test completed without errors. However:

# smartctl -A /dev/hda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail
Always       -       0
  2 Throughput_Performance  0x0005   105   105   040    Pre-fail
Offline      -       5874
  3 Spin_Up_Time            0x0007   200   200   033    Pre-fail
Always       -       1
  4 Start_Stop_Count        0x0012   096   096   000    Old_age
Always       -       6796
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail
Always       -       0
  8 Seek_Time_Performance   0x0005   120   120   040    Pre-fail
Offline      -       36
  9 Power_On_Hours          0x0012   086   086   000    Old_age
Always       -       6268
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       1167
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       65
193 Load_Cycle_Count        0x0012   065   065   000    Old_age
Always       -       359932
194 Temperature_Celsius     0x0002   130   130   000    Old_age
Always       -       42 (Lifetime Min/Max 11/58)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age
Always       -       9
197 Current_Pending_Sector  0x0022   100   100   000    Old_age
Always       -       1
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age
Always       -       0

I still have Current_Pending_Sector==1 and smartd sends me an e-mail
at every reboot and complains about it. What should I do?

Davide

[1] http://tinyurl.com/83g265


Reply to: