[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Anybody familier with GNU Queue?



Hi All,

I'm trying to set up a machine farm using GNU Queue as the job scheduler,
but I'm having difficulty understanding exactly what I need to do.

The situation is...

I have a number of machines, lets call them MachineA, MachineB, MachineC,
etc. 

So my qhostsfile looks like this:
---------------------------------------------
MachineA   
MachineB
MachineC
MachineD
MachineE
MachineF
MachineG
MachineH
MachineI
Machinej
---------------------------------------------

Machines A-F are single processor 3GB of RAM
Machines G-J are dual   processor 2GB of RAM

I have a number of different types of job that I want to run, each of which
require licences that I have a fixed number of. For example, only 6 jobs of
Type 1 can run simultaneously, and only 8 of job Type 2. All jobs are
expected to be run non-interactivly.

JobType 1 requires a machine with 3GB of RAM, and so mustn't be allocated a
2GB box or share with other jobs.

JobType 2 can run on 2GB boxes and can share.

My plan is to make a queue for each different job type. 

# cd /var/lib/queue
# ls -l
drwxr-xr-x    3 root     root          512 Feb 27 12:00 JobType1
drwxr-xr-x    3 root     root          512 Feb 27 12:01 JobType2
drwxr-xr-x    3 root     root          512 Feb 27 11:50 now
drwxr-xr-x    3 root     root          512 Jan 16 16:01 wait


So now I'm comming to write the profiles for the queues and I'm getting
stuck (primarilly because the documentation is little more than reference
material). I've copied what I've got down below.

Do these look about right?
Is there any way to ensure jobs don't get started on machines that have only
a small amount of free memory?
Do the rlimit variables set limits (i.e. like the shell limit command)?
Is there any other setup I need to do?

I don't have access to the machine farm at the moment, so I'm trying to
set-up as much as possible before hand, hence I can try any of this just
yet. Any insights anybody can give me will be useful.

Thanks

Paul

-------------------- Job Type 1 Profile ---------------------
exec on

mail /var/lib/queue/JobType1/mail_log
supervisor /var/lib/queue/JobType1/mail_log2

host MachineA pfactor 100
host MachineB pfactor 100
host MachineC pfactor 100
host MachineD pfactor 100
host MachineE pfactor 100
host MachineF pfactor 100
host MachineG pfactor   1
host MachineH pfactor   1
host MachineI pfactor   1
host MachineJ pfactor   1

maxexec 6
host MachineA  vmaxexec 1
host MachineB  vmaxexec 1
host MachineC  vmaxexec 1
host MachineD  vmaxexec 1
host MachineE  vmaxexec 1
host MachineF  vmaxexec 1
host MachineG  vmaxexec 0
host MachineH  vmaxexec 0
host MachineI  vmaxexec 0
host MachineJ  vmaxexec 0

-------------------- Job Type 2 Profile ---------------------
exec on

mail /var/lib/queue/JobType2/mail_log
supervisor /var/lib/queue/JobType2/mail_log2

host MachineA pfactor 1
host MachineB pfactor 1
host MachineC pfactor 1
host MachineD pfactor 1
host MachineE pfactor 1
host MachineF pfactor 1
host MachineG pfactor 200
host MachineH pfactor 200
host MachineI pfactor 200
host MachineJ pfactor 200

maxexec 8
host MachineA  vmaxexec 0
host MachineB  vmaxexec 0
host MachineC  vmaxexec 0
host MachineD  vmaxexec 0
host MachineE  vmaxexec 0
host MachineF  vmaxexec 0
host MachineG  vmaxexec 2
host MachineH  vmaxexec 2
host MachineI  vmaxexec 2
host MachineJ  vmaxexec 2
-------------------------------------------------------------

-- 
Paul Sargent
mailto: Paul.Sargent@3Dlabs.com



Reply to: