[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Would Hurd be able to become a distributed OS?

Jean-Philippe BOISSEAU wrote Fri, 10 Nov 2000 14:00:41 +0100):
> A friend and I were wondering about relevancy of a distributed os.
> When we discovered Mach concept, as implemented in Hurd, whe thought
> about an os that would run on several machines, as current OSes run
> on several cpus of the same machine.
> Of course, there are latency problems, because of network speed and
> determinism.
Distributing the Hurd over several machines, each of one running Mach
is possible, if some assumptions are met. However, you won't be happy
with the resulting efficiency penalty at all.

Some variants of Mach include support for NORMA-IPC. Basically, port
names are extended with a host-id prefix, so that sending/receiving
messages to mach kernels (and therefore user-land programs) running
on other machines is possible. On top of NORMA-IPC, one can provide
distributed memory by having pagers run on different hosts and letting
clients obtain memory objects from foreign pagers if needed. To assure
integrity (e.g. one writer, many readers), the pagers must follow a
protocol of page allocation and page reclaiming. One such protocol
is present in some Mach variantes: EMM/XMM.

With NORMA-IPC and EMM/XMM, you can distribute the misc. servers of
a Hurd system to different nodes of a multicomputer in a transparent
manner. A network-wide name service would have to act as a rendez-vous
point for available servers. This could be done by extending the proc
server and have proc servers synchronize their tables somehow.

Unfortunately, gnumach and oskit-mach were stripped of the norma/xmm
support present in old mach variants (probably because norma and xmm
implementations were pretty bogus and were being rewritten/reengineered
by the osf/opengroup at that time), so that it won't be easy to achieve
a distributed Hurd right now.

Bad news is also that the newer version of NORMA (NORMA-2) was never
released by the opengroup, so you'll have a lot of work to do.

A very interesting paper about distributed systems is:

  Load Distribution: Implementation for the Mach Microkernel
  Dejan S. Milojicic
  Vieweg Advanced Studies in Computer Science
  ISBN 3-528-05424-7

The PhD thesis describes how it is possible to move tasks from
one machine to another as well as how to achieve load distribution
by having the tasks being (transparently) moved to less-busy nodes.

I talked to Dejan about this work recently, and his impression was
that using Mach to achieve load distribution was doomed to fail.
The reasons again being NORMA-2 and the resulting inefficiency in
performance and the instability in case of single nodes crashing
(mainly due to the VM chains induced by incremental moving).

I did some distribution experiments with CMU-Mach, osfmach (from the
mklinux project) and rtmach (from keio/ntt) together with the Lites
single server BSD OS personality. The results seem to confirm Dejan's
impression, though I didn't test that much. I finally came to the
conclusion, that Mach was the real problem right now. That's why
I didn't resume the tests on the Hurd.

Some people at l4-hurd@gnu.org are currently thinking about porting
the Hurd to L4. If this ever becomes reality, we could implement
distributed features for the Hurd on top of L4 in a way that somewhat
resembles NORMA/XMM or in a completely different way. I hope that
the results would be better than was the case with Mach (L4 IPC is
synchronouus in nature, as opposed to Mach).

> Perhaps you'll also ask about what one could use of such an os. ;)
> I don't know yet, but it would be probably an interesting thing while
> people tend to have more than one computer at home, isn't it.
There are many scenarios for distributed os:

  1. distributing number crunshing applications
     (you have plenty of mostly idle CPU power in a workstation pool!)
  2. accessing remote filesystems, but not using NFS for this
     (with the hurd, accessing a ext2fs server via NORMA-IPC may
      prove much faster than going indirectly through pfinet!)
  3. doing transparent load distribution as was described in
     Dejan's PhD thesis.
  4. realizing redundancy by having critical tasks run on many
     computers independently, each of one supervising the master
     task in some way [There are many redundancy algorithms out there]
     (you can turn off some computers in a network without having
      to worry about crashing a critical application).
  5. Being able to freeze a task and resume it on another node
     has the nice side-effect of being able to save it to disk
     and load it some time in the future (modulo broken connections).
     (This is useful for laptops that could be turned off without
      loosing their state)



Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | farid.hajji@ob.kamp.net
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
Murphy's Law fails only when you try to demonstrate it, and thus succeeds.

Reply to: