[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The crippled resurrection of said etch.



On Thu, Oct 26, 2006 at 10:25:37AM -0400, Matthew Krauss wrote:
> Hi,  I haven't been following this to closely so I may be missing 
> something, but this message caught my eye.  I'm not sure how experienced 
> you are, so I will try to be very explicit -- if I tell you things you 
> think are obvious, please forgive me.
> 
> hendrik@topoi.pooq.com wrote:
> >On Wed, Oct 25, 2006 at 12:34:02PM -0700, Andrew Sackville-West wrote:
> >  
> >>hendrik@topoi.pooq.com wrote:
> >>
> >>    
> >>>Four more reboots, one successful.
> >>>It seems to ba a problem starting gdm.
> >>>      
> Hmm... It sounds like a race condition, obviously.  From what I have 
> read in this thread, I would guess that there is a very good probability 
> that you have an old startup script laying around from a package that 
> has been otherwise removed or upgraded.
> >>it could be an X problem or a gdm problem, but probably I'd guess X.
> >>    
> >>>It tell sme it's starting gdm,
> >>>then that it'snot starting kdm because it's not the default,
> >>>then that it's not starting (presumably another *dm) because it's 
> >>>not the default
> >>>      
> >>thats normal. X does sanity checks to make sure you're not starting more
> >>than one session manager or whatever.
> >>    
> >I know that.  I just thought that the last message before the crash 
> >might be a clue to what went wrong -- such as an unfortunate race 
> >condition between gdm and whatever thing decides not to start the other.
> >But I admit this is unlikely.
> >
> >  
> >>>then the black screen of death, preventing me from reading which other 
> >>>*dm it was considering.
> >>>      
> >>are you locked up hard at that point or can you switch to a vt? 
> >>ctrl-alt-fx?
> >>    
> >
> >Locked up hard.  THough I suppose I should try ssh-ing in.
> >  
> When you say "black screen of death" I assume you mean a kernel panic?  
> If so, ssh-ing won't work.  Also, notably, a kernel panic should *never* 
> happen (theoretically!) -- it is always the result of either a kernel 
> bug or a hardware failure.  No user-space program should be able to 
> cause a kernel panic.
> 
> What I would try to isolate the problem is:
> 
> 1. Reboot in to single user mode.
> 2. Log in as root.
> 3. Try starting X alone:
>    $ X 2>&1 | less
>    3a. If X starts, you may kill it with ctrl-alt-backspace;

X starts.
Killed it with ctrl-alt-backspace

>    3b. If X does not start, you have the output to debug;
>    3c. If you get a kernel panic, you know you have serious X problems.
> 4. Next try starting gdm directly:
>    $ /etc/init.d/gdm start
>    4a. If gdm starts, there is probably a problem in your startup scripts;
>    4b. If gdm does not start, you can check the logs under /var/log/gdm/
>    4c. If you get a kernel panic, you know you have serious gdm problems.

gdm starts.  A cursor blinks in the upper left of a black screen, then 
the cursor disappears, leaving the black screen of death.
Did a hard reset to reboot.  No trace of a log file.

try /etc/init.d/kdm start

it refuses; kdm is not default.

dpkg-reconfigure kdm
and make kdm the default.

repreat

/etc/init.d/kdm start

acts just like /etc/init.d/gdm did before -- black screen of death

reset to reboot.

This time, after the usual environment checking, it starts a maintanance 
shell with a shorter path:

/lib/init:/sbin:/bin

As a result, lots of commands don't work.  Suppliying the path 
explicitly,

/usr/bin/dpkg-reconfigure  

fails.  Its first error message reports that it cannot execute the 
'locale' command.  True enough. 'locale' is not on the path.
This looks like a but in dpkg-reconfigure -- shouldn't scripts that are 
executed as root specify their command names a little more explicitly?

Something is wrong with the maintenance shell this time -- why has its 
$PATH suddenly changed?

-- hendrik

> 
> In the case of 4a., where you have a problem in your startup scripts:
> 
> 5. Kill gdm -- use ctrl-alt-F1 to return to your terminal, and issue:
>    $ /etc/init.d/gdm stop
> 6. Switch to the default runlevels rc directory and ls it:
>    $ cd /etc/rc2.d
>    $ ls
>    See all the links named S##*
>        .. where ## is a number
>        .. and * is the rest of the name?
>    At startup, these are all started in the order of the ## numbers.
>    Scripts with the same number as gdm start at the same time.
>    These are good candidates for a race condition.
>    For instance, I have:
>        S99gdm
>        S99rc.local
>        S99rmnologin
>        S99stop-bootlogd
>    You probably have all of these, plus:
>       S99xdm
>       S99kdm
>    .. and others?
> 7. Try starting up the scripts with the same number as gdm in various 
> orders. Consider which ones sound likely to be the problem.  For 
> instance, you have guessed that another *dm is your problem, so try 
> starting first xdm and then gdm, then the other way around.  If you make 
> a crash, congratulations!
> 
> Oh, to start a script, ie. S99gdm, use:
>    $ ./S99gdm start
> 
> S99rc.local actually runs /etc/rc.local which might have anything in it, 
> so that is worth looking in to. You should probably look at 
> /etc/rc.local and see what it is doing.
> 
> Scripts with other numbers are possible too -- just less likely -- so 
> you may want to try them if you don't find the problem in the "good 
> candidates" first.
> 
> Hopefully helpful,
> 
> Matthew
> 
> >  
> >>>Could it be that the *dm is interfering with gdm starting up?
> >>>Maybe it's whatever it does *after* trying its hand with the *dm'a 
> >>>  that is the culprit?  Anyone know what that is?
> >>>Should I try making another *dm the default?
> >>>Should I try purging the other *dm's?
> >>>Should I try purging gdm?
> >>>Should I try running a general update of everything just in case?
> >>>
> >>>      
> >>as Andre said, /etc/init.d/gdm stop.
> >>
> >>then I'd get rid of the links for the moment so you can actually work on
> >>the thing: update-rc.d gdm -f remove && update-rc.d kdm -f remove and so
> >>forth. Then you can use startx as a user and see what happens.
> >>    
> >
> >Might be easier just to do this in maintenance mode, which doesn't start 
> >the things in the first place.
> >
> >There's a point -- in the two-Debian philosophy of system maintanance, 
> >use there any way of using, say, aptitude running on one system to 
> >install, uninstall, configure and so forth the other?
> >It suddenly struck me as potentially useful.  Doesn't the installer do 
> >something like this, starting from a RAMdisk?
> >
> >-- hendrik
> >
> >
> >  
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
> with a subject of "unsubscribe". Trouble? Contact 
> listmaster@lists.debian.org
> 



Reply to: