[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Trapping Errant DMA Disk Ops



On Wed, Oct 11, 2006 at 11:30:45AM +0200, David Baron wrote:
> log entry: kernel: hda: dma_timer_expiry: dma status == 0x61
> 
> They may be due to a failing disk, MB, or cable problems but they can stop the 
> system in its tracks. If the disk be mounted or otherwize being accessed, big 
> red switch time. Otherwize, that wonderful click-clack of WD disks is the 
> warning. Smartmon does not issue anything in time, it seems.
> 
> How might one configure smartmon to trap this "sooner". What I would want is 
> to kill anything accessing the disk and then stop itself from doing so as 
> well since smartmon is the likely accessing process. Try to save system from 
> paralasis!

readers of this list will begin to think that this is my solution to
every problem... well lately it has been! Check the
powersupply. Apparently, after HD's, powersupplies are the most failure
prone part of system. And they don't generally fail catastrophically,
but slowly slide out of spec causing all kinds of hard-to-diagnose
errors. I've had three machines lose a power supply in the last 6
months or so and they all manifested different symptoms. One machine
would lock up hard without warning. the second one would just
shutdown. the third one would start throwing DMA errors like yours
followed by a hard lockup or, as the problem got worse, spontaneous
reboots. In each case, it was a new power supply that solved the
problem.

oh, and get some backups done quick :)

A

Attachment: signature.asc
Description: Digital signature


Reply to: