[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The crippled resurrection of said etch.



On Thu, Oct 26, 2006 at 10:25:37AM -0400, Matthew Krauss wrote:
> Hi,  I haven't been following this to closely so I may be missing 
> something, but this message caught my eye.  I'm not sure how experienced 
> you are, so I will try to be very explicit -- if I tell you things you 
> think are obvious, please forgive me.

Thanks.  No forgiveness needed.  If anything, I appreciate this level of 
detail.  I've found that, when explaining things to others, it's hard to 
guess the proper level of detail.

> 
> hendrik@topoi.pooq.com wrote:
> >On Wed, Oct 25, 2006 at 12:34:02PM -0700, Andrew Sackville-West wrote:
> >  
> >>hendrik@topoi.pooq.com wrote:
> >>
> >>    
> >>>Four more reboots, one successful.
> >>>It seems to ba a problem starting gdm.
> >>>      
> Hmm... It sounds like a race condition, obviously.

I *thought* that was a possibility.

> From what I have 
> read in this thread, I would guess that there is a very good probability 
> that you have an old startup script laying around from a package that 
> has been otherwise removed or upgraded.

Is there any way to search for such stray files?  There were dome bugs 
in upgrade scripts a few months when X underwent two revolutions in a 
row. 

> >>it could be an X problem or a gdm problem, but probably I'd guess X.
> >>    
> >>>It tell sme it's starting gdm,
> >>>then that it'snot starting kdm because it's not the default,
> >>>then that it's not starting (presumably another *dm) because it's 
> >>>not the default
> >>>      
> >>thats normal. X does sanity checks to make sure you're not starting more
> >>than one session manager or whatever.
> >>    
> >I know that.  I just thought that the last message before the crash 
> >might be a clue to what went wrong -- such as an unfortunate race 
> >condition between gdm and whatever thing decides not to start the other.
> >But I admit this is unlikely.

I thought is unlikely because, as far as I know, these *dm startup 
scripts check whether they are default *before* they start anything up.

> >
> >  
> >>>then the black screen of death, preventing me from reading which other 
> >>>*dm it was considering.
> >>>      
> >>are you locked up hard at that point or can you switch to a vt? 
> >>ctrl-alt-fx?
> >>    
> >
> >Locked up hard.  THough I suppose I should try ssh-ing in.
> >  
> When you say "black screen of death" I assume you mean a kernel panic?  

The screen goes completely black.  No text visible.
If I recall correctly, a kernel panic usually puts a kernel panic 
message on the bottom of the screen.  But of course, perhaps it's not 
displaying the kernel logging screen when it dies.

> If so, ssh-ing won't work.

Therefore worth a try.  Give us a further clue whether it might be a 
kernel panic.

> Also, notably, a kernel panic should *never* 
> happen (theoretically!) -- it is always the result of either a kernel 
> bug or a hardware failure.  No user-space program should be able to 
> cause a kernel panic.
> 
> What I would try to isolate the problem is:
> 
> 1. Reboot in to single user mode.

Which I do by specifying "etch 1" at the lilo boot promot.
It works.  The on-screen messages call it maintenance mode, though.
I presume that's the same mode.

> 2. Log in as root.

That works.

Will do the rest later in the day when my users are gone.

> 3. Try starting X alone:
>    $ X 2>&1 | less
>    3a. If X starts, you may kill it with ctrl-alt-backspace;
>    3b. If X does not start, you have the output to debug;
>    3c. If you get a kernel panic, you know you have serious X problems.
> 4. Next try starting gdm directly:
>    $ /etc/init.d/gdm start
>    4a. If gdm starts, there is probably a problem in your startup scripts;
>    4b. If gdm does not start, you can check the logs under /var/log/gdm/
>    4c. If you get a kernel panic, you know you have serious gdm problems.
> 
> In the case of 4a., where you have a problem in your startup scripts:
> 
> 5. Kill gdm -- use ctrl-alt-F1 to return to your terminal, and issue:
>    $ /etc/init.d/gdm stop
> 6. Switch to the default runlevels rc directory and ls it:
>    $ cd /etc/rc2.d
>    $ ls
>    See all the links named S##*
>        .. where ## is a number
>        .. and * is the rest of the name?
>    At startup, these are all started in the order of the ## numbers.
>    Scripts with the same number as gdm start at the same time.
>    These are good candidates for a race condition.
>    For instance, I have:
>        S99gdm
>        S99rc.local
>        S99rmnologin
>        S99stop-bootlogd
>    You probably have all of these, plus:
>       S99xdm
>       S99kdm
>    .. and others?
> 7. Try starting up the scripts with the same number as gdm in various 
> orders. Consider which ones sound likely to be the problem.  For 
> instance, you have guessed that another *dm is your problem, so try 
> starting first xdm and then gdm, then the other way around.  If you make 
> a crash, congratulations!
> 
> Oh, to start a script, ie. S99gdm, use:
>    $ ./S99gdm start
> 
> S99rc.local actually runs /etc/rc.local which might have anything in it, 
> so that is worth looking in to. You should probably look at 
> /etc/rc.local and see what it is doing.
> 
> Scripts with other numbers are possible too -- just less likely -- so 
> you may want to try them if you don't find the problem in the "good 
> candidates" first.
> 
> Hopefully helpful,

I think it will be, when I get the machine to myself again.

> 
> Matthew
> 
> >  
> >>>Could it be that the *dm is interfering with gdm starting up?
> >>>Maybe it's whatever it does *after* trying its hand with the *dm'a 
> >>>  that is the culprit?  Anyone know what that is?
> >>>Should I try making another *dm the default?
> >>>Should I try purging the other *dm's?
> >>>Should I try purging gdm?
> >>>Should I try running a general update of everything just in case?
> >>>
> >>>      
> >>as Andre said, /etc/init.d/gdm stop.
> >>
> >>then I'd get rid of the links for the moment so you can actually work on
> >>the thing: update-rc.d gdm -f remove && update-rc.d kdm -f remove and so
> >>forth. Then you can use startx as a user and see what happens.
> >>    
> >
> >Might be easier just to do this in maintenance mode, which doesn't start 
> >the things in the first place.
> >
> >There's a point -- in the two-Debian philosophy of system maintanance, 
> >use there any way of using, say, aptitude running on one system to 
> >install, uninstall, configure and so forth the other?
> >It suddenly struck me as potentially useful.  Doesn't the installer do 
> >something like this, starting from a RAMdisk?
> >
> >-- hendrik
> >
> >
> >  
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
> with a subject of "unsubscribe". Trouble? Contact 
> listmaster@lists.debian.org
> 



Reply to: