[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Trapping Errant DMA Disk Ops



Andrew Sackville-West wrote:
On Wed, Oct 11, 2006 at 11:30:45AM +0200, David Baron wrote:
log entry: kernel: hda: dma_timer_expiry: dma status == 0x61

They may be due to a failing disk, MB, or cable problems but they can stop the system in its tracks. If the disk be mounted or otherwize being accessed, big red switch time. Otherwize, that wonderful click-clack of WD disks is the warning. Smartmon does not issue anything in time, it seems.

How might one configure smartmon to trap this "sooner". What I would want is to kill anything accessing the disk and then stop itself from doing so as well since smartmon is the likely accessing process. Try to save system from paralasis!

readers of this list will begin to think that this is my solution to
every problem... well lately it has been! Check the
powersupply. Apparently, after HD's, powersupplies are the most failure
prone part of system. And they don't generally fail catastrophically,
but slowly slide out of spec causing all kinds of hard-to-diagnose
errors. I've had three machines lose a power supply in the last 6
months or so and they all manifested different symptoms. One machine
would lock up hard without warning. the second one would just
shutdown. the third one would start throwing DMA errors like yours
followed by a hard lockup or, as the problem got worse, spontaneous
reboots. In each case, it was a new power supply that solved the
problem.

I will go along with that, my workshop is full of discarded hard drives and power supplies, waiting to go to be recycled. The HD is definitely the part most likely to fail, followed by power supplies, then cdrom, though these can often be cured by cleaning the lens, lower down the list ram chips do go bad sometimes and a couple of weeks ago I had to replace a video card for someone, blackouts and lockups at random times....


oh, and get some backups done quick :)

do them now, don't wait till tomorrow :)

A


--
Bill



Reply to: