Re: The crippled resurrection of said etch.
On Thu, Oct 26, 2006 at 10:25:37AM -0400, Matthew Krauss wrote:
> Hi, I haven't been following this to closely so I may be missing
> something, but this message caught my eye. I'm not sure how experienced
> you are, so I will try to be very explicit -- if I tell you things you
> think are obvious, please forgive me.
>
> hendrik@topoi.pooq.com wrote:
> >On Wed, Oct 25, 2006 at 12:34:02PM -0700, Andrew Sackville-West wrote:
> >
> >>hendrik@topoi.pooq.com wrote:
> >>
> >>
> >>>Four more reboots, one successful.
> >>>It seems to ba a problem starting gdm.
> >>>
> Hmm... It sounds like a race condition, obviously. From what I have
> read in this thread, I would guess that there is a very good probability
> that you have an old startup script laying around from a package that
> has been otherwise removed or upgraded.
> >>it could be an X problem or a gdm problem, but probably I'd guess X.
> >>
> >>>It tell sme it's starting gdm,
> >>>then that it'snot starting kdm because it's not the default,
> >>>then that it's not starting (presumably another *dm) because it's
> >>>not the default
> >>>
> >>thats normal. X does sanity checks to make sure you're not starting more
> >>than one session manager or whatever.
> >>
> >I know that. I just thought that the last message before the crash
> >might be a clue to what went wrong -- such as an unfortunate race
> >condition between gdm and whatever thing decides not to start the other.
> >But I admit this is unlikely.
> >
> >
> >>>then the black screen of death, preventing me from reading which other
> >>>*dm it was considering.
> >>>
> >>are you locked up hard at that point or can you switch to a vt?
> >>ctrl-alt-fx?
> >>
> >
> >Locked up hard. THough I suppose I should try ssh-ing in.
> >
> When you say "black screen of death" I assume you mean a kernel panic?
> If so, ssh-ing won't work. Also, notably, a kernel panic should *never*
> happen (theoretically!) -- it is always the result of either a kernel
> bug or a hardware failure. No user-space program should be able to
> cause a kernel panic.
>
> What I would try to isolate the problem is:
>
> 1. Reboot in to single user mode.
> 2. Log in as root.
> 3. Try starting X alone:
> $ X 2>&1 | less
> 3a. If X starts, you may kill it with ctrl-alt-backspace;
X starts.
Killed it with ctrl-alt-backspace
> 3b. If X does not start, you have the output to debug;
> 3c. If you get a kernel panic, you know you have serious X problems.
> 4. Next try starting gdm directly:
> $ /etc/init.d/gdm start
> 4a. If gdm starts, there is probably a problem in your startup scripts;
> 4b. If gdm does not start, you can check the logs under /var/log/gdm/
> 4c. If you get a kernel panic, you know you have serious gdm problems.
gdm starts. A cursor blinks in the upper left of a black screen, then
the cursor disappears, leaving the black screen of death.
Did a hard reset to reboot. No trace of a log file.
try /etc/init.d/kdm start
it refuses; kdm is not default.
dpkg-reconfigure kdm
and make kdm the default.
repreat
/etc/init.d/kdm start
acts just like /etc/init.d/gdm did before -- black screen of death
reset to reboot.
This time, after the usual environment checking, it starts a maintanance
shell with a shorter path:
/lib/init:/sbin:/bin
As a result, lots of commands don't work. Suppliying the path
explicitly,
/usr/bin/dpkg-reconfigure
fails. Its first error message reports that it cannot execute the
'locale' command. True enough. 'locale' is not on the path.
This looks like a but in dpkg-reconfigure -- shouldn't scripts that are
executed as root specify their command names a little more explicitly?
Something is wrong with the maintenance shell this time -- why has its
$PATH suddenly changed?
-- hendrik
>
> In the case of 4a., where you have a problem in your startup scripts:
>
> 5. Kill gdm -- use ctrl-alt-F1 to return to your terminal, and issue:
> $ /etc/init.d/gdm stop
> 6. Switch to the default runlevels rc directory and ls it:
> $ cd /etc/rc2.d
> $ ls
> See all the links named S##*
> .. where ## is a number
> .. and * is the rest of the name?
> At startup, these are all started in the order of the ## numbers.
> Scripts with the same number as gdm start at the same time.
> These are good candidates for a race condition.
> For instance, I have:
> S99gdm
> S99rc.local
> S99rmnologin
> S99stop-bootlogd
> You probably have all of these, plus:
> S99xdm
> S99kdm
> .. and others?
> 7. Try starting up the scripts with the same number as gdm in various
> orders. Consider which ones sound likely to be the problem. For
> instance, you have guessed that another *dm is your problem, so try
> starting first xdm and then gdm, then the other way around. If you make
> a crash, congratulations!
>
> Oh, to start a script, ie. S99gdm, use:
> $ ./S99gdm start
>
> S99rc.local actually runs /etc/rc.local which might have anything in it,
> so that is worth looking in to. You should probably look at
> /etc/rc.local and see what it is doing.
>
> Scripts with other numbers are possible too -- just less likely -- so
> you may want to try them if you don't find the problem in the "good
> candidates" first.
>
> Hopefully helpful,
>
> Matthew
>
> >
> >>>Could it be that the *dm is interfering with gdm starting up?
> >>>Maybe it's whatever it does *after* trying its hand with the *dm'a
> >>> that is the culprit? Anyone know what that is?
> >>>Should I try making another *dm the default?
> >>>Should I try purging the other *dm's?
> >>>Should I try purging gdm?
> >>>Should I try running a general update of everything just in case?
> >>>
> >>>
> >>as Andre said, /etc/init.d/gdm stop.
> >>
> >>then I'd get rid of the links for the moment so you can actually work on
> >>the thing: update-rc.d gdm -f remove && update-rc.d kdm -f remove and so
> >>forth. Then you can use startx as a user and see what happens.
> >>
> >
> >Might be easier just to do this in maintenance mode, which doesn't start
> >the things in the first place.
> >
> >There's a point -- in the two-Debian philosophy of system maintanance,
> >use there any way of using, say, aptitude running on one system to
> >install, uninstall, configure and so forth the other?
> >It suddenly struck me as potentially useful. Doesn't the installer do
> >something like this, starting from a RAMdisk?
> >
> >-- hendrik
> >
> >
> >
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
> listmaster@lists.debian.org
>
Reply to: