[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Seg 11 in GCC



From: Glenn Bily <gb2187@wcuvax1.wcu.edu>

> I have a hard time believing that RAM goes bad as much as you guys/gals
> claim. Nor do I really believe this would happen with modern machines.

Glenn,

I've seen these failures in my own system and have diagnosed them as being
RAM related. They went away when I changed the RAM.

I have a pair of 32x2 SIMMS for 16 MB in my main system. Previously, I
used 8MB of 8x2 SIMMS in two SIMM adapters. Either set of memory works fine,
but they won't work together, even though they are always installed in
separate banks. The symptoms are that the system fails the power-on self
test, or the P.O.S.T. _DOES_NOT_ fail, it boots up fine, and then GCC dies
with signal 11 while building the kernel. If I remove either bank of memory,
it works fine.

There are a lot of problems with modern RAM. Cheaper SIMMS and motherboards
no longer implement parity, and memory failures thus go undiagnosed.
Even with parity, it's easy to have a failure that gets by it. Edge connectors
and SIMM sockets get dirty. Another problem is timing - either because a
chip is not responding within the specified time, or because two SIMMS have
very different timing. Overclocked CPUs and other devices run deliberately
out-of-spec don't help the situation.

> If you encounter a memory hardware error I think your machine probably
> would not have booted in the first place.

The boot-up memory test is not exhaustive. I wrote memory diagnostics back
when Pixar made hardware - some errors are intermittent and very difficult to
catch.

> Under GCC 2.6.3 there are certain versions of the kernel that will not
> compile but this has nothing to do with bad memory.

Yes, but if you get different errors each time instead of the same error
you'd begin to suspect hardware, wouldn't you? And if you removed a bank
of SIMMS and the problems went away, as I did, you'd be even more sure.

> Under GCC 2.7.0 you are more likely have been illegally mixing ELF and
> a.out object files or libraries.

We are experienced professional programmers and are aware of what this sort
of problem would look like.

Note that the bad-address errors caused by accessing memory outside of your
address space are sometimes caused by single-bit memory errors in pointers.

	Thanks

	Bruce
--
Pixar Animation Studios: Reality is not our business.
Pixar's "Toy Story" $184,592,498 domestic, $49 million overseas and counting.



Reply to: