Re: Getting Started

To: dan nedelko <dan@genuinemedia.com>
Cc: debian-beowulf@lists.debian.org
Subject: Re: Getting Started
From: "Jorge L. deLyra" <delyra@latt.if.usp.br>
Date: Sat, 9 Mar 2002 10:21:20 -0300 (BRT)
Message-id: <[🔎] Pine.LNX.3.96.1020309092747.32562A-100000@latt.if.usp.br>
In-reply-to: <[🔎] 3C88F61E.3000005@genuinemedia.com>

Dan,

> I am very envious of your cluster, each node sounds quite powerful, much
> more powerful than my severly dated machines. However, we must all start
> somewhere and hopefully this will appeal to the 'powers that be' and
> also be useful for the undergrads for initial steps into leaarning
> parallel programming concepts.

This is more or less what happened here. We started with a small prototype
and the idea caught on. We used a bunch of ten 500 MHz Alphas (with Linux)
before that, in a more traditional setting, with disks, no remote boot. At
some point we took the disks off, because they gave us continuous trouble,
and started using NFSroot, that was a big improvement, but we still booted
from floppy (two, in fact, one for MILO, one for the kernel). Recently we
deactivated them here in the Department and passed them on to the computer
center of the Institute because they gave us too much administration work.
It's still a good machine, they have the manpower to handle it, we don't.

So the process happened in several stages. The nodes I describes are from
our main machine, which currently has 21 nodes. There are a few others in
this University using 800 MHz and 1000 MHz nodes. Currently we can buy one
of those nodes for a bit less than US$ 500, so our 21 nodes total about
US$ 10,000. If you add the 24-port 100 Mbps 3COM switch, US$ 2,500 for a
nice server and some more infrastructure items, the whole thing comes up
to something like US$ 15,000. This looks like a lot of money to a person
(well, to me it does |:-) but it is peanuts compared with what used to be
spent here for the purchase of considerably _less_ computer power. The
traditional approach in universities is big brand-name boxes, you know.

> I am in the process of creating my node kernel as we speak, I am using the
> make-kpkg --append_to_version diskless buildpackage
> 
> I have been doing alot of digging, and I am forcing myself  to do things 
> the right way the first time. From what I have managed to dig up, the 
> make-kpkg tool is the best for a debian system. I know I will not be 
> working on this machine forever and I want to make a new admin 
> (hopefully an undergrad student) as comfortable as possible, with as 
> much documentation as possible.
> 
> I assume you are using debian, and did you use this make-kpkg process? I 
> know it's a one time thing, and once set up is not neccessary to modify, 
> but if you recall what was done, it would be helpful to me.

I think this is the utility in the kernel-package package, right? Well, we
do use Debian exclusively here, but the one thing we do not use Debian for
is compiling the kernel. We always compile our kernels directly from the
original sources. It is easy, instructive, and fun! When we built our fist
node we started by compiling a kernel for it on the server and booting it
from floppy. First we got the NFSroot part to work, then worried about
network booting. For that, we took our floppy kernel, run the Etherboot
mknbi-linux on it and installed it in the tftpboot directory. At fist we
encoded all the NFSroot boot parameter information into the NBI boot block
of the kernel, later we switched to using DHCP to do this, which is a much
better way (centralized) to manage the whole thing.

If fact, I recorded the whole experience in a howto which is available
online, but it has 2 problems, a small one and a big one: the small one is
that it is a bit dated, about a year old, and there are many improvements
we are already using which are not mentioned there; I've been meaning to
write a new version of it, but where is the time...; the big one is that
it is in Portuguese, it was meant for national consumption here; I have
considered translating it but where is the time for that. I even started
writing a semi-automatic translation tool that could be used for this,
but again, no time to finish it. Anyhow, here is the address if anyone out
there knows enough Spanish to be able to make head and tail of it:

http://latt.if.usp.br/pmc/

It is rather extensively documented, including libraries of configuration
files, scripts, diagrams, etc. I even have some snapshots of an early
version of the machines, but they are not yet online. I am willing to
explain our whole strategy and architecture is people are interested, but
if I just jump into this these messages are bound to get unbearably long.
So I will give a general explanation of some of the basics in the answer
to the next message, and take it from there.
							Cheers,

----------------------------------------------------------------
        Jorge L. deLyra,  Associate Professor of Physics
            The University of Sao Paulo,  IFUSP-DFMA
       For more information: finger delyra@latt.if.usp.br
----------------------------------------------------------------

Reply to:

Follow-Ups:
- Re: Getting Started
  - From: "Jonathan D. Proulx" <jon@ai.mit.edu>

References:
- Re: Getting Started
  - From: dan nedelko <dan@genuinemedia.com>

Prev by Date: Re: Getting Started
Next by Date: Re: Getting Started
Previous by thread: Re: Getting Started
Next by thread: Re: Getting Started
Index(es):
- Date
- Thread