[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Modifying Debian for Infrastructures--Step 1

At 02:43 PM 19-02-00 +0100, Marcelo E. Magallon wrote:
>>> "Bud P. Bruegger" <bud@sistema.it> writes:

I answer the following first:

> Perhaps if you explain exactly what you want to achieve (distributed
> computing, diskless workstations, an heterogenous (from the hardware
> POV) cluster, an heterogeneous (from the OS POV) cluster, ???)

An infrastructure is a usually large, heterogeneous, and possibly
geographically distributed cluster of machines.  What I want to do is find
a way of managing them efficiently and centrally.  My starting point is the
LISA98 paper "Bootstrapping an Infrastructure" by Steve Traugott, Sterling
Software, NASA Ames Research Center -- stevegt@TerraLuna.Org and Joel
Huddleston, Level 3 Communications -- joelh@TerraLuna.Org.
(http://www.infrastructures.org/papers/bootstrap/bootstrap.html). I try to
adapt this to a scenario where Debian tools can be used to make life easier
(as compared to manual installation usually with a non-standard file
hierarchy layout).  I personally run only i386 Debian but would like
something that can later be extended to other platforms and base operating
systems  (see earlier postings for rational).

Infrastructure hosts are usually not diskless but have at least what they
need for booting on a local disk.  Usually, one considers the whole
infrastructure like a single virtual machine with many heads and usually,
from any machine a user should see the same work environment.  Since it is
not feasible to have everything on every machine, usually the often used
stuff is copied locally and kept in sync with a "gold server", and the
rarely used stuff is network mounted.  

> > For installing packages on a cluster of machines, we chose to
> > install to a globally visible filesystem.  There are directory
> > subtrees for different versions of the same package and for
> > different architectures.
> The different architectures I can understand but why the different
> versions?  In the best case, that's a really quick path leading to
> trouble.

In such an infrastructure with hundreds of users, version upgrade is often
not as atomic as on a single machine.  For example in the case where new
versions are not downwards compatible, the transition to the new version
takes some time (seen over the whole group of users).  Also, in an
organization that develops their own software, the testing of new versions
is usually limited to a smaller group of persons while other still use the
old version.  All infrastructure approaches that I'm aware of support
multiple versions of a package.  

> > The individual machines use sym-link farms (created with slink or
> > stow) to run these packages.  In a first step we would like to
> > modify source packages such that the installation directories
> > become parametrized and choosable at build time as a command line
> > option.
> although somewhat desirable, it's not always easily achievable
> without major effort.  For example, I'm toying with the idea of
> packaging Cactus, but it has a build system which is really
> convenient for those who want it "working right now" (with whatever
> configuration and file layout upstream chose), but it's a real
> nightmare in the context of Debian Policy.

The more I think of this the more I see that it is probably impossible to
have automatic solutions for this.  What I'd be interested in is how much
more effort you see in modifying say Cactus for a parametrized solution as
oposed to a (current) Debian Policy solution.  If the effort is comparable,
the former solution may become quite attractive...

> that's we still have 'a job', to bend some authors' ideas of correct
> filesystem layout to Debian's Policy dictated layout.  Even with
> fully autoconf/automake packages this is troublesome, because GNU
> standards diverge from the FHS in some significant ways (/etc and
> /var are perhaps the most notorious)

My dream would be that there could be generally accepted guidelines and
tools for original authors, not to standardize the layout to use, but to
make it easy to change for others.  I see this as some kind of extension of
autoconf/automake... But that will take some time to become common place,
if it every will...

> Perhaps if you explain exactly what you want to achieve (distributed
> computing, diskless workstations, an heterogenous (from the hardware
> POV) cluster, an heterogeneous (from the OS POV) cluster, ???)
> Depending on what that is, dpkg --root=/foo might do the trick, in
> particular dpkg --root /usr/lib/pckg/<arch>.  

Raul Miller proposed something very similar (with dpkg --unpack).  See my
response there...

> A third approach
> (which I like better) is to spend a few extra money on small hardisks
> (something in the order of 2 GB being the smallest you find nowadays,
> you get them for US$80 or less), and deal with the problem of keeping
> the nodes in sync.  Some black magic with TFTP, ramdisks, DHCP/BOOTP,
> multicast and such is in order for a really effective solution here,
> and I'd love to see such a tool available in Debian.  Perhaps you'd
> like to redirect your efforts in this direction?

Well, I believe this is more or less what I'm up to.  The bootp/tftp stuff
comes in when putting adding a virgin machine to the infrastructure
(actually I currently use FAI that includes this and other things) and
after that the machine can boot and the problem is of central management
and synchronization.  

Multicast would not be a good solution since you'll never get all "slave"
hosts to receive input from the master at the same time (push) and keeping
track of which host has received config changes and which hasn't is a
nightmare.  That's why all workable infrastructures solutions use pull

Thanks a lot

| Bud P. Bruegger, Ph.D.  |  mailto:bud@sistema.it                       |
| Sistema                 |  http://www.sistema.it                       |
| Information Systems     |  voice general: +39-0564-418667              |
| Via U. Bassi, 54        |  voice direct:  +39-0564-418667 (internal 41)|
| 58100 Grosseto          |  fax:           +39-0564-426104              |
| Italy                   |  P.Iva:         01116600535                  |

Reply to: