Re: [Nbd] distributed, fault-tolerant network block device

To: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] distributed, fault-tolerant network block device
From: Goswin von Brederlow <goswin-v-b@...186...>
Date: Tue, 03 Apr 2012 16:06:33 +0200
Message-id: <87hax0kiza.fsf@...860...>
In-reply-to: <4F776C56.3060207@...1045...> (Corin Langosch's message of "Sat, 31 Mar 2012 22:43:02 +0200")
References: <4F776C56.3060207@...1045...>

Corin Langosch <info@...1045...> writes:

> Hi,
>
> I'm looking for a distributed, fault-tolerant network storage system which
> exposes block devices (not filesystems) on the clients. Matching my
> requirements I only found ceph's rdb, but it's still very experimental as far
> as I know. So I'm thinking about implementing the system myself using nbd and
> the design of moosefs, http://www.moosefs.org/, which is quite simple (the
> single point of failure master server is ok for me).

There is ransrid [1] which you can combine with nbd/aoe/iscsi to
distribute the physical disks and I'm writing my own project MAID that
intends to run with distributed notes and uses the same idea as ransrid.

MAID (Massive Array of Independent Disks)

Features for 1.0 (current WIP):

- single control note that handles journaling and provides the
  management interface
- n:m redundancy - n data disks, m parity disks, m out of n+m disks may fail
- independent disks - only the disk you use needs to spin up
- distributed storage by connecting disks via nbd/aoe/iscsi to the
  control node
- supports up to n + m = 65536 disks
- disks are exported via NBD

Features for the future:

- resize and reshape support
- wake-on-lan and power-down when idle support for NBD (kernel + client patch)
- distributed storage to multiple servers
- clients connect to the storage server instead of the control node
- offload some computation to storage nodes
- storage node journal via the control/journaling node
- management interface pass-through from storage node to control node
- read-only access if the control node fails
- distributed computations
- broadcast/multicast support for distributed computations
- support infiniband RDMA
- distributed control node

I'm writing MAID in ocaml and using libaio (linux async IO library) so
it will be Linux only. It still is verry much a work in progress, no
alpha release or anything yet that I could share, and you might not have
the time to wait for something usable. But it's coming.

> But there's a question left: I read nbd can easily be deadlocked when not
> properly dealt with memory requests:
>  
>
>     when the system is short of memory, it tries to write back dirty pages. So
>     the nbd client asks the nbd server to write back data, but as nbd-server is
>     a userland process, it may require creating dirty pages to fullfill the
>     request. 
>
>
> I suppose this is still an issue? Should it be possible to work around those
> problems completely if my userland programs allocate all memory needed upfront
> and I mlock them? Or is there anything else to take care of?
>
> What do you think of my idea at all? Did I miss anything?
>
> Corin

That is an issue if you are running the server and client on the same
system. Since you are looking for something distributed I assume you
will have clients and servers seperated.

Note: This can also be done by putting them in seperate virtual
machines.

It also seems to be quite rare. I've been testing my MAID running server
and client on the same system and never had any deadlocks yet. And MAID
allocates a lot more memory than nbd-server would. It might even only
happen if you swap to NBD.

MfG
        Goswin
--

1: http://www.mshopf.de/proj/ransrid.html

Reply to:

Prev by Date: Re: [Nbd] distributed, fault-tolerant network block device
Next by Date: [Nbd] proto.txt missing from Release 3.0
Previous by thread: Re: [Nbd] distributed, fault-tolerant network block device
Next by thread: Re: [Nbd] distributed, fault-tolerant network block device
Index(es):
- Date
- Thread