Re: System failures: LEDs 1, 2 & 6
Thanks for the reply.
We're using the kernel that came with the 0.9.2 distribution, and I
doubt that apt-get update has been run. We have a serial console
(minicom from an Intel front end) but of course it is only useful if
connected to the machine that fails. I can't give more specific
information because the computer science wizards have disconnected our
gateway to the cluster in anticipation of a physical move down the
hall (they did this on Thursday last, without bothering to tell us it
would happen, or when it would be back up).
Your questions suggest a tie between the software and the hardware.
Can you elaborate?
On Sat, Apr 20, 2002 at 12:17:46PM -0400, Carlos O'Donell Jr. wrote:
> > We have managed to get our "cluster" of 15 or so 715/75's running, but
> > have had 3 unexplained system failures. They run for a while (they are
> > on continuously) and then die of a hardware failure. As indicated on
> > the Subject line, LED's 1, 2 and 6 come on when they stop.
> > Checking what little documentation that we have, it appears that these
> > 3 lights reflect something called a "PCX-T FRU", which, as near as we
> > can tell is the CPU.
> We run a cluster of 50+ 715/50's (even older) boxes with no problem.
> They get warm, but they don't fail. The largest problem we've had is
> keeping the old SCSI drives alive, and when they fail we move them into
> a diskless setup.
> a. What kernel are you running?
> b. What ISO did you install from?
> c. Have up apt-get update/upgrade'd in a while?
> d. Do you have serial console to watch for a kernel oops?
> - Or log indicating any failure?
Linux is a stimulating and productive alternative to other PC operating systems.
To UNSUBSCRIBE, email to firstname.lastname@example.org
with a subject of "unsubscribe". Trouble? Contact email@example.com