[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: how to be told imm. when dma is turned off?



On Wednesday 19 October 2005 09:07, Hugo Vanwoerkom wrote:
> Hi,
>
> The other day under 2.6.13-ck8 and Sarge, the kernel, bless 'm (her?),
> reset ide0 and turned off dma on /dev/hdb  where I was running on
> partition #3. (See the end of this post)
>
> I saw the effects of it while playing KUSC, but did not realize it was
> dma that was turned off and a reset had occurred.
>
> A little later on the kernel mounted the fs r/o and all hell broke loose
> of course.
>
> How can I be told immediately when dma is turned off on either disk and
> a reset has occurred? (Without having to look someplace).
>
> These were the syslog messages:
> ...
> Oct 15 04:44:14 localhost kernel: hdb: dma_intr: status=0×51 {
> DriveReady SeekComplete Error }
> Oct 15 04:44:14 localhost kernel: hdb: dma_intr: error=0×84 {
> DriveStatusError BadCRC }
> Oct 15 04:44:14 localhost kernel: ide: failed opcode was: unknown
> Oct 15 04:44:14 localhost kernel: end_request: I/O error, dev hdb,
> sector 32573730
> Oct 15 04:44:14 localhost kernel: Buffer I/O error on device hdb3,
> logical block 163905
> Oct 15 04:44:14 localhost kernel: lost page write due to I/O error on
> hdb3 ...
> Oct 15 04:44:14 localhost kernel: hdb: dma_intr: status=0×51 {
> DriveReady SeekComplete Error }
> Oct 15 04:44:14 localhost kernel: hdb: dma_intr: error=0×84 {
> DriveStatusError BadCRC }
> Oct 15 04:44:14 localhost kernel: ide: failed opcode was: unknown
> Oct 15 04:44:14 localhost kernel: ide0: reset: success
> ...
> Oct 15 07:00:01 localhost /USR/SBIN/CRON25263: (root) CMD (test -x
> /usr/sbin/anacron || run-parts—report /etc/cron.daily)
> ...
> Oct 15 07:00:02 localhost kernel: attempt to access beyond end of device
> Oct 15 07:00:02 localhost kernel: hdb3: rw=0, want=269866160,
> limit=15631245 Oct 15 07:00:02 localhost kernel: attempt to access beyond
> end of device Oct 15 07:00:02 localhost kernel: hdb3: rw=0,
> want=269866160, limit=15631245 Oct 15 07:00:02 localhost kernel: EXT2-fs
> error (device hdb3):
> ext2_readdir: bad page in #83883
> Oct 15 07:00:02 localhost kernel: Remounting filesystem read-only
> ...
>
> BTW this is a 4 months old SAMSUNG 80GB ATA disk.
>
> Thanks.
>
> H

Have a look at smartmontools.  You can configure how often to poll drives, 
and to email you under certain conditions.  The drives have to be SMART 
capable, but unless the drive is really old, that's not a problem, and 
since you mention yours are almost new, it will almost certainly be SMART 
capable.

If you're adventurous, you can use the -M exec PATH to "perform useful  
tricks  when  a  disk problem  is detected (beeping the console, shutting 
down the machine, broadcasting warnings to all logged-in users, etc.)  But 
please be careful. smartd will block until the executable PATH returns,  so  
if  your executable hangs, then smartd will also hang."  The smartmontools 
package comes with some example scripts.

Justin Guerin



Reply to: