[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Disk performance deteriated to unbearable levels



David Purton wrote:
> But I think DMA is enabled on the disk. From dmesg:
> [    1.808090] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

I believe all SATA interfaces are go for DMA.

> Ha! I just found some disk related errors in syslog:
> 
> Nov  2 12:10:58 swires kernel: [33736.415350] sd 0:0:0:0: [sda] Unhandled error code
> Nov  2 12:10:58 swires kernel: [33736.415367] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Nov  2 12:10:58 swires kernel: [33736.415376] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 16 48 77 76 00 01 d0 00
> Nov  2 12:10:58 swires kernel: [33736.415395] end_request: I/O error, dev sda, sector 373847926
> Nov  2 12:10:58 swires kernel: [33736.415404] Buffer I/O error on device sda7, logical block 15133136
> Nov  2 12:10:58 swires kernel: [33736.415409] lost page write due to I/O error on sda7
> Nov  2 12:10:58 swires kernel: [33736.415415] Buffer I/O error on device sda7, logical block 15133137
> Nov  2 12:10:58 swires kernel: [33736.415420] lost page write due to I/O error on sda7
> Nov  2 12:10:58 swires kernel: [33736.415427] Buffer I/O error on device sda7, logical block 15133138
> 
> I'm guessing this is bad! :(

Yes.  That's bad!  I would make sure your backup is good and you have
a recovery plan.

For your next system I highly recommend setting it up with RAID.  It
makes problems like these so much easier.  [Of course because of the
problem with flooding in Thailand and human reaction to it the cost of
disk drives is soaring right now.  Unfortunate timing to lose a drive.]

> >   smartctl -H /dev/sda
> SMART overall-health self-assessment test result: PASSED

Unfortunately SMART isn't a great health indicator.  But it often
confirms a failure.

> >   smartctl -l selftest /dev/sda
>
> # 1  Short offline       Completed without error       00%      3686 -

That part is good.

> >   smartctl -t long /dev/sda
> 
> Haven't done this yet.

I would guess from the I/O errors reported by the kernel that a long
selftest will also report errors.

> > > I do not want to reinstall if at all possible.
> > 
> > I am always an advocate of upgrades not re-installs.  :-)
> 
> I have a bad feeling about this one :(

If you RAID1 a system, even if it only has one disk, then replacing or
upgrading the system is easy.  Just patch in a second disk and sync
the mirror.  After the sync is complete then remove the original disk
drive and run from the replacement.  Disk upgrades are trivial that
way.  Of course system reliability is even better with both disks in
the mirror active.  :-)

Good luck!

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: