[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The crippled resurrection of said etch.



Hi, I haven't been following this to closely so I may be missing something, but this message caught my eye. I'm not sure how experienced you are, so I will try to be very explicit -- if I tell you things you think are obvious, please forgive me.

hendrik@topoi.pooq.com wrote:
On Wed, Oct 25, 2006 at 12:34:02PM -0700, Andrew Sackville-West wrote:
hendrik@topoi.pooq.com wrote:

Four more reboots, one successful.
It seems to ba a problem starting gdm.
Hmm... It sounds like a race condition, obviously. From what I have read in this thread, I would guess that there is a very good probability that you have an old startup script laying around from a package that has been otherwise removed or upgraded.
it could be an X problem or a gdm problem, but probably I'd guess X.
It tell sme it's starting gdm,
then that it'snot starting kdm because it's not the default,
then that it's not starting (presumably another *dm) because it's not the default
thats normal. X does sanity checks to make sure you're not starting more
than one session manager or whatever.
I know that. I just thought that the last message before the crash might be a clue to what went wrong -- such as an unfortunate race condition between gdm and whatever thing decides not to start the other.
But I admit this is unlikely.

then the black screen of death, preventing me from reading which other *dm it was considering.
are you locked up hard at that point or can you switch to a vt? ctrl-alt-fx?

Locked up hard.  THough I suppose I should try ssh-ing in.
When you say "black screen of death" I assume you mean a kernel panic? If so, ssh-ing won't work. Also, notably, a kernel panic should *never* happen (theoretically!) -- it is always the result of either a kernel bug or a hardware failure. No user-space program should be able to cause a kernel panic.

What I would try to isolate the problem is:

1. Reboot in to single user mode.
2. Log in as root.
3. Try starting X alone:
   $ X 2>&1 | less
   3a. If X starts, you may kill it with ctrl-alt-backspace;
   3b. If X does not start, you have the output to debug;
   3c. If you get a kernel panic, you know you have serious X problems.
4. Next try starting gdm directly:
   $ /etc/init.d/gdm start
   4a. If gdm starts, there is probably a problem in your startup scripts;
   4b. If gdm does not start, you can check the logs under /var/log/gdm/
   4c. If you get a kernel panic, you know you have serious gdm problems.

In the case of 4a., where you have a problem in your startup scripts:

5. Kill gdm -- use ctrl-alt-F1 to return to your terminal, and issue:
   $ /etc/init.d/gdm stop
6. Switch to the default runlevels rc directory and ls it:
   $ cd /etc/rc2.d
   $ ls
   See all the links named S##*
       .. where ## is a number
       .. and * is the rest of the name?
   At startup, these are all started in the order of the ## numbers.
   Scripts with the same number as gdm start at the same time.
   These are good candidates for a race condition.
   For instance, I have:
       S99gdm
       S99rc.local
       S99rmnologin
       S99stop-bootlogd
   You probably have all of these, plus:
      S99xdm
      S99kdm
   .. and others?
7. Try starting up the scripts with the same number as gdm in various orders. Consider which ones sound likely to be the problem. For instance, you have guessed that another *dm is your problem, so try starting first xdm and then gdm, then the other way around. If you make a crash, congratulations!

Oh, to start a script, ie. S99gdm, use:
   $ ./S99gdm start

S99rc.local actually runs /etc/rc.local which might have anything in it, so that is worth looking in to. You should probably look at /etc/rc.local and see what it is doing.

Scripts with other numbers are possible too -- just less likely -- so you may want to try them if you don't find the problem in the "good candidates" first.

Hopefully helpful,

Matthew

Could it be that the *dm is interfering with gdm starting up?
Maybe it's whatever it does *after* trying its hand with the *dm'a that is the culprit? Anyone know what that is?
Should I try making another *dm the default?
Should I try purging the other *dm's?
Should I try purging gdm?
Should I try running a general update of everything just in case?

as Andre said, /etc/init.d/gdm stop.

then I'd get rid of the links for the moment so you can actually work on
the thing: update-rc.d gdm -f remove && update-rc.d kdm -f remove and so
forth. Then you can use startx as a user and see what happens.

Might be easier just to do this in maintenance mode, which doesn't start the things in the first place.

There's a point -- in the two-Debian philosophy of system maintanance, use there any way of using, say, aptitude running on one system to install, uninstall, configure and so forth the other? It suddenly struck me as potentially useful. Doesn't the installer do something like this, starting from a RAMdisk?

-- hendrik





Reply to: