[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT: nice'ing jobs



on Thu, Nov 30, 2000 at 06:10:20PM -0500, Brian Stults (bs7452@csc.albany.edu) wrote:
> Sorry for the off-topic post, but I'd like to draw on the general
> computing expertise of this group.
> 
> Where I work, we have a unix server with 4 CPU's.  

I gather this is a proprietary Unix, not GNU/Linux?

> There is not a "nice" police at our center, and I have been trying to
> make the case to the sysadmin that there should be.  Could someone
> please review my brief argument and tell me if I am incorrect in my
> thinking?  Here's an example...
> 
> Here is some truncated ps output:
> 
>     USER   PID %CPU %MEM    STIME NI COMM
>    user1 2573 24.6  0.3 10:52:45 20 EMMIX
>    user1  1067 24.3  1.0 09:33:22 20 EMMIX
>    user1  2636 24.1  0.9 10:58:42 20 EMMIX
>    user2  7153 20.4  0.2 17:35:39 20 SPSS
> 
> The first three jobs are CPU-intensive and will run for about 24 hours. 
> The fourth job is I/O intensive and will run for maybe 2 hours.  Since
> there are four processors, at this point the jobs are not going to
> interfere with each other.  However, if one more CPU-intensive job were
> added by user1, all jobs would be slowed proportionately.  

Not quite, as I understand.  If the system doesn't have process
migration, two jobs will stack up on one CPU.  Each will see a 50%
performance hit.  The other three runnable processes will be unaffected.
But I could be wrong on this.  Otherwise, largely right.

> My argument is that nice'ing the CPU-intensive jobs would cause the
> I/O-intensive job to run faster without slowing the CPU-jobs at all.

Maybe, maybe not.  Some systems are tuned for CPU preference, others are
I/O blocked -- your I/O intensive job either blocks the CPU-heavy jobs
by default, or simply isn't grossly effected by them (it's blocked on
I/O, not CPU).

> The reason is that the I/O-intensive job doesn't use much CPU-time.
> So when it gets its turn on the CPU it doesn't use all of its allotted
> time.  However, it still has to wait an equal amount of time to get
> its turn at the CPU again.
> 
> Generally speaking, is this correct in theory?  It seems especially
> considerate to nice the CPU-intensive jobs, since that user gets more
> aggregate CPU time anyway since they're running multiple big jobs.

I'd make this argument.

First, use a queuing system.  Queueing systems are designed to handle
the issue of long-running jobs by one or more users.   I run stuff to
batch all the time, even on my single-user systems -- it's easier to
deal with then a backgrounded or foregrounded process.  Options include
at, cron, and batch, as well as other more advanced systems.

nice your batched jobs.  Debian doesn't make the automating of this
quite as straightforward as, say, Solaris (the queuedefs file), but you
can launch your scheduler with options, including load-limiting factors.
Not sure how to run 'nice' into it -- possibly by nicing the daemon?

It's also helpful to be able to set a timeout (ulimit) on specific job
queues.  Scaling this so that the the most time-limited queues have the
highest priority tends to take care of the load-balancing problem --
people will slot their work to the fastest queue which allows sufficient
runtime for their job.  Note that these days the idea of shared, heavily
tasked, batch environments is pretty foreign to most, especially the
younger crowd, but some of use have been there, and this actually does
work pretty well, when it's needed.  

Note also how 'nice' works -- all jobs in the run queue at a nice level
of n have priority over jobs with a nice value of n+1.  The
lower-priority jobs are only cleared to run when all higher priority
tasks have run.  The run queue is (IIRC) roughly equivalent to the
values in your system load average, and indicates the processes in a
runnable state for a given system timeslice.


The reason for nicing long-running jobs is this:

 -  Short-run jobs, whether foreground or batch, generally have someone
    waiting on the results.  It makes sense to turn them around
    reasonably quickly.  At the same time, they don't consume heaps of
    system resources because they are time-limited.

 -  Long-run jobs are typically less time sensitive -- five minutes'
    variance in s 20 hour run is an 0.4% deviation.  In a time-averaged
    environment, backgrounded processes _tend_ to get the resources they
    need.

Limited queueus -- a queue with a limited number of run slots -- are a
good means for allocating and measuring resources.  At the point at
which you could get more jobs run by adding slots, but total throughput
drops, it's time to start looking at adding more firepower.


While there are a slew of batching systems out there, a well designed
system of queues, and mix of goading and system design (e.g.:  replace
default command for a process with one which automatically submits it to
batch), can often provide benefits.


-- 
Karsten M. Self <kmself@ix.netcom.com>     http://www.netcom.com/~kmself
 Evangelist, Zelerate, Inc.                      http://www.zelerate.org
  What part of "Gestalt" don't you understand?      There is no K5 cabal
   http://gestalt-system.sourceforge.net/        http://www.kuro5hin.org

Attachment: pgpYuTBjfrtVH.pgp
Description: PGP signature


Reply to: