[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] nbd for large scale diskless linux



On Sat, Oct 08, 2005 at 01:36:52PM +0200, dsuchod wrote:
> Hi!!
> 
> > > ethernet link (99Mbits) and a good satisfied gigabit link (300Mbits). The 
> > > load on the server (Dual AMD64 2GHz, 2GByte Mem) was moderate (15%), the 
> > > load on the client (IBM X41 centrino laptop with 512MB) no problem at all ...
> > 
> > Great! That would mean the kernel-space lockups have finally been
> > resolved -- there used to be issues under such high loads.
> > 
> > What kernel were you using on the client?
> 
> Server system is a debian 3.1 with standard out of the debian shelf 2.6.8 
> kernel, client was a SuSE9.3 system with 2.6.11 kernel.

Aha.

> Unfortunately SuSE does not provide a nbd package, so didn't try it
> from a SuSE server system (same architecture, 2.6.11 amd64 kernerl)
> yet ... but that could be done in some next steps ...

The server's kernel isn't all that important, really. Nbd-server does
use <linux/nbd.h>, but that's only to get some constants it needs for
the protocol -- if you copy that one file, you can theoretically build
it on every POSIX compliant system (and I'm sure it has been built
successfully on at least The Hurd and on FreeBSD, since I've tried it on
those myself)

> I was using the 2.7.3 client tools from the sourceforge repository and:
> 
> lp-srv02a:~# nbd-server --help
> This is nbd-server version 2.7.3
> 
> on the debian system.

Right.

> > > Now the question: How does nbd-server would scale with 20 up to 100
> > > clients on one server (no big deal for kernel-nfs).
> > 
> > I haven't tried, but...
> > 
> > nbd-server currently works with a fork-per-client multi-client scheme.
> > While this works, it's not the best option performance-wise. There
> > shouldn't be much of an issue if you're using "only" 50 clients (they
> > all keep their TCP connection open at all times, so there's no danger of
> > a "thundering herd" problem), but it might not scale all that well.
> 
> Okay, then I could try to alter my environment, that the clients boot nbd 
> instead of nfs and try that out. But I can only do so, if no other is 
> using our PC pools here (until I'm sure it works and I can switch over
> into production).

Understood.

> > I'm in the process of improving nbd-server to better cope with
> > scalability issues, but I have to say that I was a bit set off by the
> > fact that the kernel couldn't handle a continued throughput until
> > (apparently) recently...
> 
> The client still was usable under the heavy load: I had to wait a bit, 
> until the ssh connection to the server was opened, but nothing out of the 
> ordinary on a system with a disk on heavy load.

It is solved, then. It used to be the case that the client would
deadlock under heavy load.

> > > I got a lot of trouble with the late user-space nfsd consuming nearly
> > > 100% of cpu under heavy load, so I might have the same trouble serving
> > > 50+ clients with nbd-server!?!?
> > 
> > Not likely. An nfsd gets one request per file; every one of those would
> > seem to require about the same level of processing from nfsd than is the
> > case for opening an NBD connection.
> 
> Out of such a reason the aggregated network load on a nbd connection seems 
> to be smaller than on nfs.
> 
> > And once your NBD connection is open, all the server needs to do to
> > satisfy a request from a client is to
> > * Read in a TCP package
> > * Check whether this is a read or a write
> > * Copy the data from the package to disk, or from disk to a (new)
> >   network package
> > * Send a package back with confirmation that the write succeeded, or
> >   with the data in case of a read
> > 
> > Plus, of course, some error checking.
> 
> It is even easier in my environment - RO ...

No, the packets are the same in RO environments. The server will just
refuse to accept write commands...

> > > Does anyone on this tried to export a blockdev RO to a larger number
> > > of clients successfully?
> > 
> > If your block device is small enough and your memory large enough this
> > should not be problematic at all -- then everything will just remain in
> > cache, and the nbd-servers will serve out rather fast.
> 
> I hoped that for :-) I have a system image size of 6.1GB by now (most of 
> course not used very often) and memory size of 2GB (but intended for 
> upgrade onto 6GB).

Right. That should work, then.

> > Note, however, that nbd-server will not enforce read-only policies --
> > you need to do that yourself, somehow.
> 
> I thougt the -r option:
> 
> NBD_PORT[0]=5000
> NBD_FILE[0]=/dev/space/suse93
> NBD_SERVER_OPTS[0]=-r
> 
> ensures only read operation is possible!?

Indeed, I'd forgotten about that one for a second. Whoops.

Sorry, my mistake :-)

> At the moment I mount ramfs slices to parts of the fs I would like to
> alter on my clients (/etc, /var, /tmp, ... just same as doing with
> nfs). In the future I'm thinking of exchanging these bind mounts for
> cowloop over the whole blockdevice so that I get a virtually writeable
> remote disk on every client ...

cowloop? Is that a specific kernel feature?

If it's the copy-on-write option in nbd you're talking about then please
don't do it. It's ugly and dogslow, I should've kicked it out ages ago.

> > > Next question of interest to me - poor mans high availability: Would it
> > > be possible to put two equal servers exporting just the same partion with
> > > the same content (dd image over network) into a raid1 group on the 
> > > clients, so if one of the server crashes the client is not doomed to
> > > die too. The task is not to sync writes (the clients will see their
> > > filesystems readonly), but just to get redundancy in cases of failing
> > > of one crucial device ...
> > 
> > There've even been people using RAID1 root devices over NBD. There's a
> > link to that on the NBD home page.
> 
> I read something on that on the enbd site,

No, I said NBD, not ENBD :-)

> > enbd is something more complex and somewhat different from NBD.
> 
> Isn't there the server process moved into kernel domain?

Could be. I'm not following ENBD all that closely.

-- 
The amount of time between slipping on the peel and landing on the
pavement is precisely one bananosecond



Reply to: