[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] nbd-server segfault on x86_64



On Thu, Feb 23, 2006 at 09:13:24PM +0100, JaniD++ wrote:
> ----- Original Message ----- 
> From: "Wouter Verhelst" <wouter@...3...>
> To: "JaniD++" <djani22@...60...>
> Cc: <nbd-general@lists.sourceforge.net>
> Sent: Tuesday, February 14, 2006 7:25 AM
> Subject: Re: [Nbd] nbd-server segfault on x86_64
> 
> 
> > On Tue, Feb 14, 2006 at 12:58:12AM +0100, JaniD++ wrote:
> > > > On Mon, Feb 13, 2006 at 10:53:32PM +0100, JaniD++ wrote:
> > > > [...]
> > > > > > > The system:
> > > > > > > P4 Cual core(64 bit), Fedora Core 4 X86_64, Kernel
> 2.6.16-rc1-git4,
> > > nbd
> > > > > > > 2.8.2, compiled on this system.
> > > > > >
> > > > > > I believe these problems have been fixed in 2.8.3, though I'm not
> > > > > > entirely sure.  Could you try with 2.8.3? If that does not work,
> we'll
> > > > > > need to debug a bit more.
> > > > >
> > > > > Not needed.
> > > > > It looks like fixed on 2.8.3.
> > > >
> > > > Right, I thought as much.
> >
> > Hmm. Forgot this: there's also a pretty nasty bug in nbd-server 2.8.3
> > involving the incorrect killing of child processes, which will fill up
> > your syslog in no time.
> >
> > It's fixed for the Debian packages, but I still need to do a release for
> > the source.
> >
> > I'll do that once I checked whether the insanely huge devices work.
> 
> I'd like to ask, there is some news with the big devices?

Sorry, I forgot; I did some work on it, found out that my approach was
wrong, and had to leave; after that, I didn't look into it anymore.
I also don't have much time at the moment (with FOSDEM and all).

I'll make sure to have some reasonable answer by the weekend.

> I almost run out the disk space, and need to grow the xfs...

Yes, that sounds rather... important.

> [...]
> 
> I have another problem at this time.
> Some nodes are rarely and randomly disconnected.
> The nbd-client exits, and my big raid is crashed.
> 
> The dmesg messages is like this:
> nbd7: Attempted send on closed socket
> end_request: I/O error, dev nbd7, sector 0
> Buffer I/O error on device nbd7, logical block 0

You should be able to reconnect it at that point -- but I agree, this
disconnect shouldn't happen at all.

> And this cause another problem!
> If the traffic is high enough, this message are slows down or completely
> stops the system.
> 
> I try to write a script to check the nbd-clients pid number, but the
> response time is too slow. :(
> It is too hard to implement one option to nbd-client like --nodaemon or
> something else?
> I mean staying in foreground and with -v option is printing useful verbose
> and debug informations?

The way nbd-client is implemented, this is impossible: the only thing
the nbd-client process does is to perform a handshake with nbd-server,
and set up a socket. After that, it runs an ioctl(), which does not
return until the device is disconnected.

I could perhaps make it not fork() before doing that; but it's not as if
I can make it output any useful information without going into kernel
space.

-- 
Fun will now commence
  -- Seven Of Nine, "Ashes to Ashes", stardate 53679.4



Reply to: