[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Intermittent System Lockup (long)



On Wednesday 17 April 2002 11:58 pm, Karsten M. Self wrote:
> on Wed, Apr 17, 2002, Jamin W. Collins (jcollins@asgardsrealm.net) wrote:
> > On Wed, 17 Apr 2002 11:53:10 -0700
> >
> > "Karsten M. Self" <kmself@ix.netcom.com> wrote:
> > > on Wed, Apr 17, 2002, Jamin W. Collins (jcollins@asgardsrealm.net)
> > >
> > > wrote:
> > > > >  - CPU -- a continuous kernel-build loop is a pretty good test.
> > > > >  You're  looking for SIG-11 errors.
> > > >
> > > > I'll give that a try, how long would you suggest it run before
> > > > considering this test to have been passed?
> > >
> > > Run through it one or more times.  In some cases there are thermal
> > > effects.  Depending on processor speed, you may want to set a
> > > continuous loop and run the process for an hour or so.
> >
> > Created two copies of the 2.4.16 kernel source and have currently been
> > running two endless compiling loops in SSH sessions to the system.  The
> > loops have been running for 4+ hours now, planning to let them run over
> > night.
>
> Sounds like a negative.
>
> > > You might try mounting your drives 'sync' (synchronous mode), and
> > > launching Mozilla under strace, logging stderr.  This may be able to
> > > capture the final system calls of the program.
> >
> > I'll give this a run tomorrow.  Hopefully I can readily get Mozilla to
> > drop the system.
>
> You've largely eliminated memory and CPU.
>
> Another possible HW problem might be a disk corruption in your swap
> partition.  I suggest this just because I now that Mozilla tends to
> grow, and stress swap.  Though I would tend toward a driver issue.  Not
> sure of a good swap tester, anyone have any suggestions?
>

as far as i can trace back, i can't see any precise description of the 
archtiecture on which this is happening to you. i've  got an amd k6 chip 
where the same symptoms happened with damn near every kernel iteration 
between 2.2.19 up until the current 2.4.17. i traced my problem back to one 
point in the kernel source, namely at slab.c:1248, caused by, as far as my 
investigation could determine, an attempted allocation of a non-allocatable 
memory address. i tried to contact the author of that piece of the kernel 
code but ended up with an undeliverable message about a week later. 
contacting amd got a response that they did not have the resources to 
investigate linux-related bugs. in any case, since running 2.4.17 for about 
six months, now, the problem has never re-occured. in fact, i've been so 
relieved to have a crash free configuration that i've postponed the effort to 
test any kernel beyond the one that works. 

the symptoms were exactly the same: random system freezes, often within 
booting, regularly on running x, but always infrequent with no discernable 
pattern involved, and with nothing registered in dmesg or any of the 
appropriate x logs. i eventually found the clue in kern.log. the absence of 
the same begins with the very date that i compiled the 2.4.17 kernel from 
source from kernel.org, which was also actually the first kernel i compiled 
in the non-debian way. maybe that matters.

in the event that this same issue is the cause of your grief, sorry i didn't 
file a bug report, back then. i guess i could be more sociable but i kind of 
lose myself in the task of curing the problem, and tend to forget that part 
of the allegiance to linux.

ben


-- 
To UNSUBSCRIBE, email to debian-user-request@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: