[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: problem: SATA performance drop down



On Fri, Jul 06, 2007 at 02:32:12PM +0800, GUO Zhijun wrote:
> 
> It's a Tyan S2925 with single AMDx2 4800+, 4xWD2500YS, 4x1G DDR2 box.
> Running etch 2.6.18-4-amd64, soft raid5 and soft raid 1.  It's serving
> web request for static files and nfs export its storage to other
> boxes.
> 
> One day its performance dropped down suddenly and loadavg was going up
> to 200~400.  I used hddtemp to check the temperature.  sda sdb is
> normal and hddtemp returned immediately but when hddtemp was checking
> it stalled for 3-5 seconds and reported they don't seem to have a
> sensor. -___-b
> 
> I also found the following log
> 
> Jul  3 15:00:22 jupiter kernel: ata3: port is slow to respond, please be 
> patient
> Jul  3 15:00:22 jupiter kernel: ata3: soft resetting port
 
> Jul  3 15:00:23 jupiter kernel: ata4: soft resetting port
 
> jupiter:~# hdparm -t /dev/sd[a-d]
> /dev/sda:
> Timing buffered disk reads:   16 MB in  3.17 seconds =   5.05 MB/sec
> /dev/sdb:
> Timing buffered disk reads:   56 MB in  3.07 seconds =  18.24 MB/sec
> /dev/sdc:
> Timing buffered disk reads:   34 MB in  3.19 seconds =  10.65 MB/sec
> /dev/sdd:
> Timing buffered disk reads:   54 MB in  3.21 seconds =  16.83 MB/sec
> 
> Crying.... could any one help? any hints?
> 
> md: md3 stopped.
> md: bind<sdb5>
> md: bind<sdd5>
> md: bind<sdc5>
> md: bind<sda5>
> md: kicking non-fresh sdc5 from array!
> md: unbind<sdc5>
> md: export_rdev(sdc5)
> raid5: device sda5 operational as raid disk 0
> raid5: device sdd5 operational as raid disk 3
> raid5: device sdb5 operational as raid disk 1
> raid5: allocated 4262kB for md3
> raid5: raid level 5 set md3 active with 3 out of 4 devices, algorithm 2

It looks like something somewhere is messing up with sdc5.  Without
sdc5, your CPU will be busy computing the missing data from the parity
info on the other raid5 disks.  This will seriously slow down the
system.

The question is, where is the problem?  Partition, partition table,
drive, cable, controller, MB?

Since sdc5 is out of the array, what happens if you take it all the way
out, and treat it as a scratch partition, then add it back into the
array as a new partition?  Follow syslog and see what errors pop up.

You could, with it out of the array, put a filesystem on it, having it
check badblocks while it does so.  See if having each block read and
written causes the drive firmware to fix things.

Doug.



Reply to: