[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

monitoring load average

I am involved with setting up NetSaint monitoring of a medium size network.

One problem I have is determining suitable ways of monitoring system load.  A 
machine with 100% usage of a resource by server processes will have request 
queues that grow indefinately (and performance will suck).

So the load average doesn't seem particularly useful.  If a machine has a 
sustained load average of 3.0 from from CPU operations and it has two CPUs 
then that indicates a problem.  If it is from disk operations and there are 
four disks in a RAID-5 array then it's equal to the number of non-parity 
stripes and the load is probably at the limit of what it can handle.  If it's 
half from CPU and half from disk then it shouldn't be a problem at all.

I think that perhaps a better way would be to have one test measure on the 
amount of CPU time used (the sum of the "user" and "system" percentages of 
the CPU usage as reported by top would do - nice time doesn't matter).

Then I could have another test measure the disk utilization in terms of the 
await, svctm, or %util fields as reported by iostat.

Any suggestions?

http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page

Reply to: