monitoring load average

To: Debian ISP <debian-isp@lists.debian.org>
Cc: csmall@debian.org
Subject: monitoring load average
From: Russell Coker <russell@coker.com.au>
Date: Tue, 7 Jan 2003 17:49:48 +0100
Message-id: <200301071749.48123.russell@coker.com.au>
Reply-to: Russell Coker <russell@coker.com.au>

I am involved with setting up NetSaint monitoring of a medium size network.

One problem I have is determining suitable ways of monitoring system load.  A 
machine with 100% usage of a resource by server processes will have request 
queues that grow indefinately (and performance will suck).

So the load average doesn't seem particularly useful.  If a machine has a 
sustained load average of 3.0 from from CPU operations and it has two CPUs 
then that indicates a problem.  If it is from disk operations and there are 
four disks in a RAID-5 array then it's equal to the number of non-parity 
stripes and the load is probably at the limit of what it can handle.  If it's 
half from CPU and half from disk then it shouldn't be a problem at all.

I think that perhaps a better way would be to have one test measure on the 
amount of CPU time used (the sum of the "user" and "system" percentages of 
the CPU usage as reported by top would do - nice time doesn't matter).

Then I could have another test measure the disk utilization in terms of the 
await, svctm, or %util fields as reported by iostat.

Any suggestions?

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page

Reply to:

Follow-Ups:
- Re: monitoring load average
  - From: Adrian 'Dagurashibanipal' von Bidder <avbidder@fortytwo.ch>
- Re: tip and another question
  - From: "Alex Borges (lex)" <alex@sogrp.com>

Prev by Date: lsof +L1 - mysql.err.1 unlinked ...
Next by Date: RE: monitoring load average
Previous by thread: Re: lsof +L1 - mysql.err.1 unlinked ...
Next by thread: Re: monitoring load average
Index(es):
- Date
- Thread