[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What can we learn from Scyld?

On Thu, Feb 22, 2001 at 12:31:34PM -0500, Camm Maguire wrote:
> Greetings!  Subject says it all.  Can we learn/implement any
> techniques from Scyld for Debian Beowulf clusters?

   From what I've read on the same mailing lists I understand that Scyld are
using either a modified PBS or OpenPBS.  We don't have packages of it.  At
the moment OpenPBS is very non-free, but has an expiration clause that
eliminates most or all of the non-free clauses in December (subject to a
debate on debian-legal about the advertising clause).
   What we have right now is DQS, which is slightly non-free (no commercial
distribution) and on the ropes development-wise (FSU no longer supports but
does not wish to change the license).  I and a few other people are
considering setting up a sourceforge project to keep it going, but no one
really wants to put a lot of time into it without a free software license. 
I've let the author (who has been trying to get a license change for years
now) know that we'd like to have the commercial distribution clause dropped
so that DQS can go into main, but he hasn't had any luck yet and is out in
the real world now.  I'm not sure that DQS was ever suitable for very large
clusters, and has migration difficulties between releases due to the design
of it's internode communication protocol (not that I know PBS is any better
this way).

   Two DFSG-free alternatives that have some significant disadvantages are
GNQS (POSIX, orphaned for years upstream, and never suitable for distributed
parallel jobs) and GNU queue (non-POSIX, relocation may conflict with
conventional distributed parallel jobs, scaleable?).

   Many programs implement their own private single-purpose queueing
systems.  It would be much better if we could provide several alternate
standard queueing systems and modify these programs to use the standard
system directly. For instance, you might run seti at low priority in the
queueing system (requeueing on completion), rip CDs locally but do the
encoding out on the cluster, perform daily system management (the slocate
scans for instance) and all other resource intensive tasks in a more
efficient serialized-per-node manner, rather than all fighting over the disk
head positions between competing little queueing systems trying to run their
pet tasks simultaneously. If we could hack together a simple, tiny, robust
single-node posix compatible queueing system we could ship that standard (or
even essential) and make all the other cluster queueing systems drop in
replacements for it.

   Another area that we are weak is in network filesystems, but then
everyone is as far as I know so we aren't really behind the 8-ball.  It
would be nice to be out in front with solid well documented easily
configured DFS/AFS/Coda/Intermezzo/Mosix systems instead of just grotty old
NFS. Does Scyld have anything new there?

Reply to: