[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [OT]? Debian High Availability Cluster



On Wed, Jan 17, 2001 at 05:06:40PM -0700, Monte Milanuk wrote:
> 
> I've just been lurking on the list for a few weeks, watching for good
> ideas for when I get the time/equipment to make a small cluster of my
> own.  But I would like to suggest that maybe rather than having a
> debian-beowulf list _and_ a debian-ha list, perhaps one list
> 'debian-cluster' devoted to all clustering topics would address the
> needs of a slightly bigger section of the userbase, especially since as
> you say, a lot of the fundamentals are similar, like how to
> install/admin multiple nodes, etc.
> 
> Just an idea, 

    I think discussing HA issues on debian-beowulf is perfectly alright.  At
least one of us has a performance cluster with a central file server that he
wishes was more highly available..  The front/backends to performance
clusters are excellent candidates for smaller high-availability clusters. It
would be an awful waste of money to have thousands of compute nodes with no
means to use any of them.  NIS and such are common on both HA and HP
clusters.

   So, what experiences have clusterers had with distributed filesystems? 
Are any of them more robust than a centralized, conventional NFS server? My
experience with coda about 2 years ago convinced me that coda was still an
interesting experiment, but not ready to commit 10 let alone 300 users to.
Even if I'd gotten it running well I was still worried about users being
confronted by conflict resolution, and the RAID1-or-nothing style
duplication of data disks, rather than something RAID5-like.
   I've considered running RAID5 over nbd devices in order to create a
single huge filesystem which could survive single data-node failures
automatically and manually switch over to a backup front-end if the
raid5-combining node failed.  Problems with this are performance (requiring
a coda/intermezzo back out to the high-perf cluster), and that journalled
filesystems require strict write ordering which is difficult to achieve over
a raid device (especially in the even of a failure and parity
reconstruction).  Last try NBD also had some fairly major problems (system
lockup when mounting an EXT2 filesystem on the nbd device).
   Tux2 looks *really* promising on raid devices as it doesn't care what
order the data in each phase tree is written, only that it all gets written
before the new rootblock is written.
   A huge dedicated FC raid box would work well, but at 10X the price of IDE
disks it would only be large, not huge.  Haven't heard any stories of large
IDE-RAID cabinets that feed out a single SCSI or FC to the host system.

-Drake



Reply to: