Re: Seg 11 in GCC
The manifestation of memory problems can easily be a function of usage
patterns. If you change your usage, then a trouble-free process may
show error messages, or a problematic process may suddenly work. That
does not remove the possibility of a memory problem.
Note that a bad simm may only show up bad if a particular pattern
happens to show up in certain of its bits. Also note that there are
some (from a surface-perspective) non-deterministic interactions between
a machine's RAM, and its virtual memory system. If you have a page in
RAM during one compile, and a page in swap during another, the error
could conceivably show up in the first, but not the second.
That's not to say that the signal 11 problem is definitely hardware -
I've seen no argument compelling enough to make me conclude 100% either
way, hardware or software. Bad RAM does seem to be the stronger
hypothesis, IMO, tho.
Michael Alan Dorman wrote:
>
> In message <[🔎] m0u2NcT-00063bC@mongo.pixar.com>, Bruce Perens writes:
> >From: Glenn Bily <gb2187@wcuvax1.wcu.edu>
> >> I have a hard time believing that RAM goes bad as much as you guys/gals
> >> claim. Nor do I really believe this would happen with modern machines.
> >I've seen these failures in my own system and have diagnosed them as being
> >RAM related. They went away when I changed the RAM.
>
> With all due respect to the many (many!) people here who know more
> about gcc/linux/Unix/whatever than I, I feel obligated to say that
> while occasional SIG11 problems may have been RAM related (and I do
> have some experience with memory problems---I've upgraded Atari STs by
> hand-soldering DIPs piggy-back on existing memory. Not somerwhere you
> want a cold-solder joint), the current trend of automatically
> classifing every single SIG11 as indicative of bad memory is simply
> hogwash. I base my statement on my experiences of 3/28 (yesterday).
>
> I've got a brand new P150 here next to me, with 64MB RAM, a DPT F/W
> SCSI-2 controller all running off a Fujitsu Fast (not Wide) SCSI-2
> disk. I installed the Debian base system, and just enough to compile
> a kernel, and then I tried to recompile. I got a SIG11s when trying
> to compile conmakehash.c (during make dep). I reinstalled gcc,
> conmakehash compiled fine, but I got SIG11s and vm errors. I
> increased my swap space from 8MB (top claimed it wasn't swapping, why
> would it need more), and I was down to just SIG11s.
>
> Then I compiled a 1.3.80 kernel on another machine, reinstalled Debian
> using 1.3.80, and can now do 'make -j zImage' (and there are few
> things more amusing than watching top while this is happening, as your
> entire screen will just _fill_ with gcc processes). I have done this
> upwards of 20 times (11 minute kernel compiles are fun, too) in the
> last 24 hours. Never seen a SIG11.
>
> So, I've not changed the hardware, and I'm excercising it more than I
> was previously (keeping the load above 6, mostly), and yet I see no
> SIG11s, even during the parallel compilations. That would tend to
> cast significant doubt on the common assertion that SIG11 = hardware
> problem, no?
>
> Mike.
> --
> "Don't let me make you unhappy by failing to be contrary enough...."
Reply to: