[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Hurd CVS and X Debugging [Was: Re: Plans for X]

On Mon, Oct 23, 2000 at 05:06:19AM +0200, Marcus Brinkmann wrote:
> On Sun, Oct 22, 2000 at 07:48:40PM -0700, Steve Bowman wrote:
> > 
> > Verified.  X 3.3.6 works with 0921 hurd but not with cvs hurd.
> > I just retested X (failed), downgraded hurd, and retested X (success).
> > An extract of the X output log is in [1].  This time, the failed X test
> > went a little differently.  I didn't try to see what would happen on
> > a restart so I didn't get the "X is already running" error and such.
> > I was just trying to confirm the same X worked on 0921 and not on cvs
> > (20001020) hurd.
> That looks quite bad. Seems your Hurd build went hovac. I never saw
> something like this, but I didn't test the recent CVS in full, only pfinet.
> I will prepare a new Debian Hurd package soon, and then I'll see if I can
> reproduce that.
> Try replacing only the /hurd/lbd translator, if that's the culprit.

(I assume you meant kbd.)

I haven't gotton back to trying this since I've been fooling with 4.x,
but I think I found the problem anyway.  I've been having a different
problem with 4.x than those on your "list of 5".  After quitting X,
the mouse translator sucks up 100% of the cpu and becomes unkillable.
It doesn't happen every time, but often enough I caught it with gdb.

I was looking through the sources (of my cvs tree of hurd) and it
didn't match the gdb output.  I realized I was looking through the
wrong sources since I had to downgrade hurd to the 0921 debs.  Well,
the curious thing was that the hurd debs from 1020 cvs don't include the
following translators: mouse, kbd, streamdev.  So, I guess the problem
was that when I installed the debs built from cvs, I was using the old
translators which I guess are incompatible.

Anyway, the gdb output I was able to get before it locked was (this is
from 0921 hurd debs):

#  From before /hurd/mouse blew up:
Symbols already loaded for /lib/libhurduser.so.0.0
(gdb) where
#0  0x106914c in evc_wait () from /lib/libc.so.0.2
#1  0x10697f9 in mach_msg () from /lib/libc.so.0.2
#2  0x103414b in cproc_block () from /lib/libthreads.so.0.2
#3  0x103c47c in ports_interrupt_notified_rpcs () from /lib/libports.so.0.2
#4  0x1069625 in mach_port_deallocate () from /lib/libc.so.0.2
#5  0x103be07 in ports_do_mach_notify_dead_name () from /lib/libports.so.0.2
#6  0x103ce93 in _ports_record_interruption () from /lib/libports.so.0.2
#7  0x103cf17 in ports_notify_server () from /lib/libports.so.0.2
#8  0x103aa8d in ports_end_rpc () from /lib/libports.so.0.2
#9  0x103ae15 in ports_manage_port_operations_one_thread ()
   from /lib/libports.so.0.2
Cannot access memory at address 0x1.

#  After it blew up:
(gdb) info threads
  559 thread 57.559  0x106914c in evc_wait () from /lib/libc.so.0.2
  558 thread 57.558  0x106914c in evc_wait () from /lib/libc.so.0.2
  557 thread 57.557  0x106914c in evc_wait () from /lib/libc.so.0.2
  556 thread 57.556  0x106914c in evc_wait () from /lib/libc.so.0.2
  555 thread 57.555  0x106914c in evc_wait () from /lib/libc.so.0.2
  554 thread 57.554  0x106914c in evc_wait () from /lib/libc.so.0.2
  553 thread 57.553  0x106914c in evc_wait () from /lib/libc.so.0.2
  552 thread 57.552  0x106914c in evc_wait () from /lib/libc.so.0.2
  551 thread 57.551  0x106914c in evc_wait () from /lib/libc.so.0.2
  550 thread 57.550  0x106914c in evc_wait () from /lib/libc.so.0.2
  549 thread 57.549  0x106914c in evc_wait () from /lib/libc.so.0.2
  548 thread 57.548  0x106914c in evc_wait () from /lib/libc.so.0.2
  547 thread 57.547  0x106914c in evc_wait () from /lib/libc.so.0.2
  546 thread 57.546  0x106914c in evc_wait () from /lib/libc.so.0.2
  545 thread 57.545  0x106914c in evc_wait () from /lib/libc.so.0.2
  544 thread 57.544  0x106914c in evc_wait () from /lib/libc.so.0.2
  543 thread 57.543  0x106914c in evc_wait () from /lib/libc.so.0.2
  542 thread 57.542  0x106914c in evc_wait () from /lib/libc.so.0.2
  541 thread 57.541  0x106914c in evc_wait () from /lib/libc.so.0.2
  540 thread 57.540  0x106914c in evc_wait () from /lib/libc.so.0.2
  539 thread 57.539  0x106914c in evc_wait () from /lib/libc.so.0.2
  538 thread 57.538  0x106914c in evc_wait () from /lib/libc.so.0.2
  537 thread 57.537  0x106914c in evc_wait () from /lib/libc.so.0.2
---Type <return> to continue, or q <return> to quit---q
(gdb) where
#0  0x106914c in evc_wait () from /lib/libc.so.0.2
#1  0x10697f9 in mach_msg () from /lib/libc.so.0.2
#2  0x103414b in cproc_block () from /lib/libthreads.so.0.2
#3  0x10348aa in __mutex_lock_solid () from /lib/libthreads.so.0.2
#4  0x80494c5 in trivfs_goaway ()
#5  0x102adef in trivfs_open () from /lib/libtrivfs.so.0.2
#6  0x10279ab in trivfs_S_fsys_getroot () from /lib/libtrivfs.so.0.2
#7  0x1027cab in trivfs_S_fsys_syncfs () from /lib/libtrivfs.so.0.2
#8  0x10285d9 in trivfs_fsys_server () from /lib/libtrivfs.so.0.2
#9  0x1024d63 in trivfs_demuxer () from /lib/libtrivfs.so.0.2
#10 0x103ae03 in ports_manage_port_operations_one_thread ()
   from /lib/libports.so.0.2
#11 0x1069d3a in mach_msg_server_timeout () from /lib/libc.so.0.2
#12 0x103aee6 in ports_manage_port_operations_one_thread ()
   from /lib/libports.so.0.2
#13 0x10356ba in cthread_body () from /lib/libthreads.so.0.2
(gdb) kill
Kill the program being debugged? (y or n) y

#  But it didn't die and my other terminals all locked up about then.

Items: need to see if sources for mouse, kbd, and streamdev are in cvs;
add "mouse translator blows up after exiting X" to "list of 5"; also,
I'm having a problem with the nv driver - it doesn't restore the display
properly when exiting X, however, I can restart X, log in remotely,
etc., it's just that the console is blank (faint after 1 exit, blank
after 2 or more exits).

Regarding the rest of your list (descriptions reiterated for convenience):

1. pflocal gets a load of threads (about 1700 after a couple of minutes
   using X). This doesn't seem to be harmful though. Killing pflocal
   between X sessions help. Does killing pflocal during an X session kill X?

I didn't see numbers anywhere near this high.  I didn't check pflocal
specifically, but the top three were all term translators in the 300-800

2. mouse translator get's slow after a while, so mouse movement lags behind.
   Killing the mouse translator helps. Does killing mouse during an X  
   session disturb X?

I didn't see this, but then after I quit, ....  I'll check if killing
pflocal or mouse disturbs X - next time I remember.

3. xdm deletes LD_LIBRARY_PATH from the environment, which means that
   it can't start other X processes. Setting the following in
   /etc/X11/xdm-config works around that, but might be a security risk(?):
   DisplayManager.exportList: LD_LIBRARY_PATH
   Could also be fixed with rpath, which conflicts with Debian policy.

I don't normally run xdm (I usually purge it) so I don't have a good
config to look at.  I may not fool with this for awhile.

4. kbd translator returns Interrupted System Call at open(). I can't
   reproduce this seperate from the Xserver, so this is a weird problem.
   Trying several times usually leads to success.

Already reported prestarting kbd translator keeps this error from
occurring.  Doesn't fix the problem though.  I looked at the X mouse code
a bit and it's opening in non-blocking mode.  Opening it in blocking mode
may hang and we'd have to do ioctl to change mode.  Really, this needs
to have some error checking and retry logic.  Kbd is probably the same.

5. pflocal doesn't release the socket, and X thinks another server is 
   already running. Sometimes killing pflocal helps, sometimes not (and in  
   the cases it doesn't, I really wonder what's the culprit).

I can't reproduce this.  Before the last round of patches, yes.  But not
anymore.  I stopped and started X at least a half a dozen times without
rebooting and without getting this error.


Steve Bowman  <sbowman@frostwork.net> (preferred)
Buckeye, AZ   <sbowman@goodnet.com> <bowmanc@acm.org>

Powered by Debian GNU/Linux and GNU/Hurd <http://www.debian.org>

Reply to: