[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Help with system recovery



Thanks for the info, I'll give tese a try. I guess one of my biggest
questions was does this sound like a hardware issue or could my system have
gotten hosed from the hard reboots. If it was the latter I would just format
and reinstall if it sounds like hardware issues I will run the tests and see
if I can find the problem (the system is only 1 year old and I hate to think
I have bad componets)
Thanks
Brad
----- Original Message -----
From: "nate" <debian-user@aphroland.org>
To: <debian-user@lists.debian.org>
Sent: Tuesday, October 29, 2002 1:35 PM
Subject: Re: Help with system recovery


> Brad Cramer said:
> >
> > be going on or how to fix this problem. What should I be looking for in
> > log files? Could it be bad RAM? Any help would be greatly appreciated.
>
>
> problems like this are the hardest to track down. There are several
> things you can try to narrow it down.
>
> BEFORE TESTING
> ===============
> Get a null modem cable, and configure console on serial port on your
> machine, if your not sure how to run a search for "linux serial console"
> on most any search engine and a buncha hits should come up, connect your
> system to another running a terminal emulation package(e.g. minicom) and
> log the output to a file(you need to keep the emulation software up
> all the time or messages may get lost).
>
>
> Test 1
> ========
> exit out of X, download, compile, and run 2-3 copies of CPUBurn available
> here:
> http://users.ev1.net/~redelm/
>
> for the first few hours keep a close eye on the system, as the website
> warns it can cause serious damage to the system if it is not properly
cooled,
> theres even been a reported case of a power supply burning out. If your
> system is properly cooled you should be able to run a lot of CPUburn
processes
> and the system won't crash or reboot. If it does, stop here. I reccomend
> running this for at least 24 hours. Do not use the computer while it
> is running or it may skew results.
>
> Test 2
> ==========
> included in the cpuburn package is a memory tester, I reccomend running
> this at a different time, but you can run it at the same time. Running
> it at the same time may make it difficult to determine what caused
> the crash(RAM or CPU). I reccomend running burnBX or burnMMX with the
> 'P' option(uses 64MB of ram) and run multiple copies of it(either load
> up screen, or load them in the background with &) if you have 512MB of
> ram I would load 7 or 8 copies. I reccomend running this test for
> about 24 hours as well. As before, I reccomend not using the computer
> while this is going on
>
>
> Test 3
> ==========
> Get memtest86 from http://www.memtest86.com/ compile it, make the
> boot disk, and boot the disk. turn on the advanced tests(see the
> documentation). This test will probably take 72 hours or more.
> your computer will not be usable while this test is running.
>
> Test 4
> ===========
> Get bonnie++, and run it in a loop, I usually loop it for 72 hours
> to test the disk and controller. redirect output to a log file so
> you can monitor it. Again I reccomend not using your computer during
> this time.
>
>
> Test 5
> =============
> Since your using nvidia, I reccomend checking to make sure AGP is
> disabled by checking /proc/driver/nvidia/agp. Also I reccomend
> disabling AGP in X, using the option:
>
> Option "NvAGP"   "0"
>
> in the Device section of your X config, same place where you define
> the driver.
>
> and try using the system(with the serial console on the other computer)
> see if it locks up still.
>
>
> Test 6
> ===============
> My next suggestion is try another kernel, preferably a 2.2.x kernel
> which may be difficult if your using ext3, though you can probably
> put the system in ext2 mode while using 2.2.x. I use 2.2.19 on all
> my systems and don't have lockups.  Not too long ago my nvidia system
> rebooted under intensive load but that was tracked down to a failed
> fan on the cheap video card which brings me to ..
>
>
> Test 7
> ================
> perhaps the easiest and least intrusive test. open the side of
> the case, point a fan(floor fan), at the internals, turn the
> fan on medium or high so a ton of air gets blown into the case and
> try to use the system, see if it locks up.
>
> as you can probably see the procedures for tracking down a system
> crash isn't easy, or fast..back when I had my Abit BP6 I spend
> literally 6 months trying different things to solve the crashes
> only to find out later that the board revision I had came with
> a defect on the voltage regulators. In the process I spent WAY
> more trying to fix the problem then I would of originally if I
> had just gone out and bought a dual P2 instead of trying to go
> cheap shit with celerons. I bought another board last year the
> Asus A7A266 which had even worse problems, something with the
> PCI bus or controller created immediate and complete filesystem
> curroption on any disk connected to the system.
>
> Also be sure you have a good quality power supply that provides enough
> power for the system. my AMD Athlon 1300 runs off a PC Power & Cooling
> TurboCool 425ATX. And it helps a lot if the system is connected to
> a battery backup system. Bad power can easily cause lockups and reboots
> without warning(such power problems may not be visible otherwise). If
> it is a power issue, there may be permanent damage to the system already.
>
> nate
>
>
>
>
> --
> To UNSUBSCRIBE, email to debian-user-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
listmaster@lists.debian.org
>
>




Reply to: