[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: unpredictable crashes, lock up, freezes, whatever



Olaf Meeuwissen <olaf@epkowa.co.jp> writes:

Yup, following up on my own post!  Please bear along with the long
quotes.  I left them in because I'm now also cross-posting this to
debian-laptop.

> "Karsten M. Self" <kmself@ix.netcom.com> writes:
> 
> > on Fri, Jun 15, 2001 at 04:51:35PM +0900, Olaf Meeuwissen (olaf@epkowa.co.jp) wrote:
> > > Dear all,
> > > 
> > > I'm running mostly testing with some unstable under linux 2.2.19 (hand
> > > rolled, of course) on an IBM ThinkPad i1476 (Type 2611).  Since a few
> > > weeks, my machine completely locks up at unpredictable moments.  The
> > > screen is no longer updated, I can't switch to a virtual terminal,
> > > even the three finger salute doesn't do a thing.  Pinging from another
> > > machine results in 100% lost packets but the PCMCIA network card keeps
> > > signalling traffic.  Just about the only thing that keeps on going is
> > > CD audio.
> > 
> > CD audio is not mediated by the OS, [...]
> 
> > > I regularly 'apt-get -t testing upgrade' and the problem hasn't gone
> > > away.  I've tried other kernels, including the Debian vanilla ones,
> > > but to no avail.  I've run memtest86 and found errors in one of my
> > > DIMMs but the problem remains even after lobotomy.  That is, even when
> > > I only use the DIMM that is okay (memtest86, 20+ passes, tests 1-7) my
> > > machine randomly locks up.
> > > 
> > > I've checked the logs but apart from occasional blocks of nulls just
> > > before a lock up, I haven't seen anything out of the ordinary.  Note,
> > > those null blocks only appear before _some_ lock ups, not all.
> > 
> > Look for power-change events under apmd.
> 
> I doubt that has anything to do with it because the machine is on AC
> 99% of the time.  [Goes checking the logs now ...]   No correlation
> between power change events and crash times.

Okay, so I compiled a kernel without any APM support, installed and
tried it.  My system froze within half an hour :-(

> > > Because I haven't experienced any lock up when using the console, I'm
> > > wondering if my graphics card (probed as Neomagic NM2200 according to
> > > XFree86 log, NeoMagic MagicMedia 256AV according to hardware spec) has
> > > gone bad.  Are there any tools a la memtest to test my graphics card?
> > 
> > Possible, but the card's pretty well supported in recent XF86 v.3 and
> > v.4 drivers.
> > 
> > It's not clear how long you're leaving your system in console mode to
> > establish whether or not this is a problem.  Might make a practice of
> > doing this on long breaks (lunch, overnight), and seeing what the
> > results are.
> 
> Sorry, should have mentioned that; somewhere around 5, 6 hours.  Have
> only done that once though.  Could try leaving it in console mode
> overnight.

Left if sitting at the console and gdm login prompts overnight as
well.  No crash.  Bad news is that as soon as I logged in through
gdm, my machine froze.  Actually, it locked up three times in ten
minutes or so :-(

> > > Before you suggest, I have already tried both Gnome (with several
> > > window managers) and KDE.  It doesn't matter.  The machine even locks
> > > up when running (x|k)screensaver during lunch :-(
> > > 
> > > If you have other ideas as to what could be the matter, I'm open to
> > > suggestions.
> > 
> > I had similar problems associated with apmd and Speedstep (aka
> > Geyserville) on my TuxTops Amethyst 20U, exacerbated by a flaky onboard
> > power port (it breaks circuit when jiggled, resulting in APM mode
> > changes).  In system BIOS, I disabled speedstep functionality -- my CPU
> > is always running in full-speed mode (600 MHz), resulting in shorter
> > battery life, but longer uptime ;-).  I've had no problems since
> > changing this setting about two months ago.
> 
> I believe I've disabled BIOS power savings settings but will double
> check at the next crash, er, reboot.

Disabled all power management settings (there's not much to be set
with this BIOS) to no avail.

> > I'd made a more complete report to debian-laptop, should be in
> > archives.
> 
> That box gave you a bit of troubles, eh?  My symptoms seem very much
> like yours.  I'll be going over my kernel APM configuration as well.

See above, that wasn't much use.

> > You might isolate video card issues by running in console mode, by
> > switching to a version 3 XF86 driver, or by switching from an
> > accelerated driver to SVGA or VGA16.
> 
> I've been thinking about running X on the frame buffer device myself.

This morning, after three lock ups in ten minutes, I compiled frame
buffer support in, fiddled my XF86Config-4 to use it and I've been up
for 5(!) hours.  I think I'll lock my session with xscreensaver (to
guarantee some Xserver activity (eh, at least until APM kicks in and
blanks the screen)) before I go home and if my machine hasn't crashed
by tomorrow morning I'm ready to believe my problem is fixed.  I might
even get bold and start using that broken DIMM again ;-)

Problem then is where to put the blame: graphics card or X driver?
I'm using xserver-xfree86 4.0.3-4.

-- 
Olaf Meeuwissen       Epson Kowa Corporation, Research and Development

     Free Software: `No walls, no windows!  No fences, no gates!'



Reply to: