[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] nbd-server 2.8.6 hangs on nbd-client reconnect



On Fri, Sep 01, 2006 at 11:23:01PM -0400, Mike Snitzer wrote:
> On 9/1/06, Mike Snitzer <snitzer@...17...> wrote:
> 
> > I'm not any closer to understanding where/why/what is causing
> > __lll_mutex_lock_wait to even be called within the nbd-server.  The
> > nbd-server is hung waiting for this mutex but the gdb backtrace is
> > truncated/useless like I showed earlier in this thread.  So is there
> > just some weird corruption occurring?
> >
> > I can reliably reproduce this issue and welcome any suggestions; I'll
> > try to get traces of the nbd-client when it fails and also compile
> > nbd-server with DDODBG.
> >
> > I'll report back if I find anything but any assistance would be appreciated.
> 
> I'm blaming glib... ndb 2.7.7's nbd-server (non-glib, uses select)
> works perfectly fine!

Well, almost.

I found that the select implementation in both 2.7 and 2.8 is broken. In
2.8 it sets the fd_set once outside the loop, and then never resets it
(select() will render the data in an fd_set somewhat undefined). In 2.7
it doesn't even set it in the first place -- I guess it only works
because of sheer luck and libc implementation details that I don't want
to look into.

The 2.9 implementation is correct; it does not have this bug.

Could you verify whether it works with 2.8.5, and with 2.9.0? If both
work, then this is your issue. I'll release 2.8.8 soon which sets the
fd_set properly (inside the loop), and which will also fix something
else if those versions work for you.

Thanks,

-- 
<Lo-lan-do> Home is where you have to wash the dishes.
  -- #debian-devel, Freenode, 2004-09-22



Reply to: