Re: Bad blocks and powernowd
2009/1/16 Johannes Wiedersich <johannes@physik.blm.tu-muenchen.de>:
> Davide Mancusi wrote:
>> The hard disk of my 4-year-old laptop is starting to fail. I ran
>> fsck.ext3 -c on my root partition yesterday and a few blocks were
>> marked as damaged. The blocks contained some XFCE4 theme files, so I
>> thought that reinstalling the relevant package should be enough. Now,
>> however, the machine hangs every time I start powernowd. Kernel
>> emergency key presses (Alt+SysRq+?) don't work and the usual log files
>> don't contain any relevant information. I have tried uninstalling and
>> reinstalling the powernowd package, but it didn't help; note also that
>> fsck did not signal any damaged files belonging to powernowd.
>>
>> Can anyone help me sort this out? Could it be that fsck -c did not
>> mark some blocks as damaged because I ran it with the root partition
>> mounted read-only (as opposed to unmounted)?
>
> If your disk is dying this could mean about anything.
>
> Try smartctl from smartmontools package. What does it report about the
> health status of your disk (after some testing)?
>
> Try e2fsck again to see, if it detects 'new' errors on your file system.
>
> I hope you have good back ups. You could try diff -r against your backup
> (mounted ro). However, if your disk is damaged and loads and runs
> garbled kernel stuff, you risk hosing your backup. Therefore it might be
> safer to investigate by booting a rescue system from CD or usb-disk. YMMV.
Thanks for your response, Johannes.
Now I'm confused. I installed smartmontools, I ran
# smartctl -t long /dev/hda
and I detected two bad sectors. I followed the HOWTO at [1] and
reallocated the first one. (I had no idea one could recover bad
sectors. I thought they were as good as gone.) Then I ran the test
again to get the LBA address of the second bad block. Surprise,
surprise, the test completed without problems.
I also tried booting off a live CD and running e2fsck -c -c on all
ext2/3 partitions. No bad blocks were detected, but one of the inode
tables was heavily modified. However, even though no files related to
powernowd were touched, powernowd now works again.
>From the live CD I ran again
# smartctl -t long /dev/hda
[waited one hour]
# smartctl -l selftest /dev/hda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 6264 -
# 2 Extended offline Completed without error 00% 6262 -
# 3 Extended offline Completed without error 00% 6259 -
# 4 Short offline Completed without error 00% 6258 -
# 5 Extended offline Completed: read failure 30% 6257
95245863
You can see that the last test completed without errors. However:
# smartctl -A /dev/hda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail
Always - 0
2 Throughput_Performance 0x0005 105 105 040 Pre-fail
Offline - 5874
3 Spin_Up_Time 0x0007 200 200 033 Pre-fail
Always - 1
4 Start_Stop_Count 0x0012 096 096 000 Old_age
Always - 6796
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
8 Seek_Time_Performance 0x0005 120 120 040 Pre-fail
Offline - 36
9 Power_On_Hours 0x0012 086 086 000 Old_age
Always - 6268
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 1167
191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 65
193 Load_Cycle_Count 0x0012 065 065 000 Old_age
Always - 359932
194 Temperature_Celsius 0x0002 130 130 000 Old_age
Always - 42 (Lifetime Min/Max 11/58)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age
Always - 9
197 Current_Pending_Sector 0x0022 100 100 000 Old_age
Always - 1
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age
Always - 0
I still have Current_Pending_Sector==1 and smartd sends me an e-mail
at every reboot and complains about it. What should I do?
Davide
[1] http://tinyurl.com/83g265
Reply to: