Bug#567204: linux-image-686-bigmem: serious filesystem corruption with bigmem kernels
On Wed, Feb 03, 2010 at 11:22:06AM +0100, Cesare Leonardi wrote:
> M. Dietrich wrote:
> > my system had serious filesystem corruption with several -bigmem
> > kernel in the past (from 2.6.28 to 2.6.32).
>
> Does this mean that with normal 686 or 486 kernel the corruption
> doesn't happen?
yes.
>
> However many years ago i've experienced frequent filesystem
> corruption but i couldn't figure out why. Eventually i discovered
> was some hdparm settings...
> Was a lot hard to find, so i hope this could help you. ;-)
there are no special settings installed using hdparm:
/dev/sda:
multcount = 0 (off)
IO_support = 1 (32-bit)
readonly = 0 (off)
readahead = 256 (on)
geometry = 30401/255/63, sectors = 488397168, start = 0
> > for sure i can't guarantee that this isn't related to some hardware
> > fault like broken ram or the like but i checked ram with memtest86+.
>
> If i were you, i would also install smartmontools and try something
> like: smartctl -a /dev/yourdisk I'd put particular attention in the
> "Vendor Specific SMART Attributes with Thresholds" table to find
> something strange.
it's already installed, this is the output:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 085 069 034 Pre-fail Always - 98867399
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 001 001 020 Old_age Always FAILING_NOW 248712
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 40211526
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 269350284038985
10 Spin_Retry_Count 0x0013 100 100 034 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 448
184 End-to-End_Error 0x0032 100 253 000 Old_age Always - 0
187 Reported_Uncorrect 0x003a 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x0022 100 100 045 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 071 052 000 Old_age Always - 29 (Lifetime Min/Max 10/48)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 19
192 Power-Off_Retract_Count 0x0022 062 062 000 Old_age Always - 77434
193 Load_Cycle_Count 0x001a 001 001 000 Old_age Always - 320283
194 Temperature_Celsius 0x0012 029 048 000 Old_age Always - 29 (0 10 0 0)
195 Hardware_ECC_Recovered 0x0010 070 061 000 Old_age Offline - 98881899
196 Reallocated_Event_Count 0x003e 096 096 000 Old_age Always - 3645 (28548, 0)
197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0000 100 253 000 Old_age Offline - 0
i wonder how to interpret that. Start_Stop_Count has FAILING_NOW, maybe because
hdaps is stopping the device often? why is that bad? hm.
but everything else looks fine, right?
> And try to hear if the disk make suspicious noise.
it doesnt - silent as a sleeping baby.
>
> If you have a minimum suspect for the ram, try to temporarly remove
> some bank, if you have more than one, or replace completely if you
> can. In the past i've seen at least two cases where memtest run ok
> for about a day but the system had sporadic system freeze and BSOD
> (Windows PCs). When i've replaced the ram the problems disapperead.
>
removing would reduce mem size and the need for bigmem kernel obsolete.
replacing isn't possible right now. point is: i never had strange behaviour
related to mem like kernel-freezes or program core dumps and i use the system
quite alot with big (cross-)compiles and everything that uses mem alot...
thing is that i discovered fs corruption by accident - git complained
about a defect repo. then i forced a fsck run at boot and that failed.
maybe all bigmem users should force a fsck and see if they already
suffer from a similar corruption. if not this bug should be closed
because it seems to be hw related. but i don't know how & where to
search, especially because this computer is a tool to do my work on.
best regards,
michael
Reply to: