[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

`linux debug' causing hang when Lilo is to be run.



 I installed on my Laptop last night, and it worked almost flawlessly
 the first time, aside from some errors from the pcmcia scripts where
 it tries to run "mount -t nfs" and uses "mount -v" a lot.  The
 network worked, and I installed base via http over a pcmcia network
 card, all from `dbootstrap'.

 I tried a second time, this time with "linux debug" at the boot
 prompt.  As many of you already know, `busybox' "init" has been
 modified so that it will, when signalled with USR2, setrlimit() to
 allow core dumps from programs it spawns.  I ran "kill -USR2 1", and
 then "kill 26", to make `dbootstrap' get restarted with core dumping
 enabled.  I have `dbootstrap' compiled "-g" and not stripped.

 I got all the way through the install, stopping right after install
 modules to edit "/target/etc/pcmcia/network" to change "mount -t nfs"
 into "mount | grep nfs", and to "sed -e 's/mount -v/mount/g' < network
 > network.new && mv network.new network && chmod +x network".

 PCMCIA configures fine, I can access the net and install base from
 the net.  But this time, when I selected "make bootable from hard
 drive", it hangs.  `ps' shows that the Lilo command is NOT what's
 hanging; it's not in the `ps' listing at all.  So, I finally ran
 "kill -QUIT .." on the pid of `dbootstrap' to make it core dump.  I
 copied the core onto a floppy, and dropped it into the source
 directory, and ran `M-x gdb dbootstrap', then typed, in the "*gdb
 dboostrap*" buffer "core-file core", and "where".  Here's the stack
 trace, which shows where `dbootstrap' was sitting when it was killed.

(gdb) where
#0  0x4008b0c2 in puts () from /lib/libc.so.6
#1  0x400897df in tempnam () from /lib/libc.so.6
#2  0x400894d5 in tmpfile64 () from /lib/libc.so.6
#3  0x805d988 in execlog (
    incmd=0x8062160 "export LD_LIBRARY_PATH=\"/target/lib:/target/usr/lib\"; /target/sbin/lilo -r /target >/dev/null", priority=6) at util.c:26
#4  0x804bda5 in run_lilo (boot=0x8076740 "/dev/hda2") at bootconfig.c:834
#5  0x804c29c in make_bootable () at bootconfig.c:955
#6  0x8055cb5 in main_menu () at main_menu.c:432
#7  0x8054e71 in main () at main.c:629

 You can see from this that it's hung in `puts()'...  I typed "up" a
 few times, and was rewarded with a source buffer displaying this
 section of util.c:

  if (bootargs.isdebug) {
    openlog(SYSLOG_IDENT, LOG_PID, LOG_USER);
    syslog(LOG_DEBUG, "running cmd '%s'", cmd);
  }
  else {
    openlog(SYSLOG_IDENT, LOG_PID, LOG_USER);
  }

 The `syslog()' call is where the highlight bar is shown; that's where
 it calls into libc, in the above trace.  I don't know why it hangs
 there; it calls `execlog()' quite a number of times through the
 install, and it works up until now...

 I rebooted again, without the debug switch, and verified that
 everything seems to work fine without it.  It runs `lilo' fine, and
 reboots normally, &c.

 Another reboot later, going through to the point where it hangs, what
 I found was that `logger' wasn't printing to the log on vt3 right
 away either...  but after I killed `cardmgr', I got a huge dump to
 the log, the `logger' test message I'd typed several minutes earlier,
 AND `dbootstrap' continued, ran Lilo, and it's sitting there now
 wondering if I'm mad at it.

 The log on vt3 looks like there was a bunch of stuff buffered from
 both `dbootstrap' and the `logger' commands I tried.  After I killed
 the `cardmgr', it all dumped, even though `cardmgr' was able to log
 all along, and it's logs showed up right away.

 I wonder if `dbootstrap' is just blocking???  Erik?  Could there be
 an error in the `syslogd' of busybox that would cause this to occur?
 Or is it `syslog()'?  What might it be?  You tell me and we'll both
 know.

 I don't know yet if, when "debug" is not given on the boot command
 line, whether the logging is deferred like this...  I need to look
 deeper into what `syslog()' does, etc.

 Any ideas?

 I also ran into the "it won't reboot" problem, and the `dbootstrap'
 isn't respawned bug.  I have a feeling that in both cases, if
 `cardmgr' is killed, `init' might continue and execute those
 actions... ?  It might be blocking trying to write to the logs.


Reply to: