[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: kernel panic - not syncic



Thanks for the detailed instructions. I assembled the machine at issue
one and half years ago on a Supermicro H8QCE with AMD cpus, mdadm
raid1 WDRaptor 150GB, 24GB Kingston ECC, amd64 etch.  I used a
non-dedicated cage, actually a 4U-rack dismissed from my institution,
other than dismissed cpus and ram; it was a very cheap machine. After
one year of work and upgrading to lenny, one 1GB mem slot died and it
was painful to detect. I succeeded thanks to the much help on this
site. Even the latest memtest gave no straightforward indications.

Recently one WDRaptor died and I replaced both with Seagate Barracuda
500GB (I do not need fast access, while I need space). I could recover
previous OS and installations, again from much help here.

The faulty boot described below was the single issue after.

Unfortunately, here in Europe, in my experience, Supermicro support
supports very little, both in the above circumstances and in honoring
my request of hardware detail about their chip for monitoring with
"sensors". Without details about the ohmic resistance they employ,
sensors suffers from unpredictable offset. The vendor (Supermicro
can't be accessed directly, everything goes through the vendor, in
this case  TWP Computers in Amsterdam) insisted that their Superdoctor
should be used, which requires a strong window system, while I have
only installed the X server. After that, the vendor did no more
answer. Another trouble of the mainboard is the Intel Boot Agent,
which can't be removed. Also, the fans plugs on the mainboard are
three-cable plugs, which means no steady regulation (one can only ask
to the BIOS to drop the voltage to 6V unless overheating occurs).
Finally, there is much unused hardware (to afford hardware raid to
Microsoft).

I can't say the mainboard is bad. It does the job, benchmarks are
excellent for that hardware. With openmpi it runs the fastest
molecular dynamics code, which means as having twice the number of
processors with respect to normal code. I say that to emphasize that
memory control myst be absolutely in order, otherwise libnuma could
not do that excellent job. However, for my next machine i would be
happy to find an alternative brand, just in tho hpe to get full
support for debian linux (I mean "sensors" for example). I would like
to assemble a four quad motherboard, to get 16 processors. And to join
present machine and another Tyan with two socktes to get 28 logical
processors. Probably. however, without an expensive Infiniband
interconnection, parallelization for molecular dynamics will not work
(it would be easier with single cpus).  Is any chance that in the near
future quad will be superseded by oct, getting 32 processor with four
sockets on a single mainboard?

thanks
francesco

On Fri, May 1, 2009 at 3:59 PM, Douglas A. Tutty <dtutty@vianet.ca> wrote:
> On Tue, Apr 28, 2009 at 08:12:14PM +0200, Francesco Pietra wrote:
>> I wonder whether a failure to boot (amd64 lenny, multiprocessor,
>> raid1) requires attention. On resetting, the boot was ok.
>
> Having the follow-on boot OK is good and bad:  good that you booted OK,
> bad in that it's an intermittant problem.
>
>> The message was
>>
>> kernel panic - not syncing: attempted to kill the idle task!
>
> I have no idea what _would_ cause this; I would suspect either an
> intermittant (or random) hardware issue or freak of nature (planets not
> alligned correctly, sun spots, whatever).  Hope that its an isolated
> incident but plan for it not being so.
>
>> I was not at the screen during the attempted boot, so that I can't say
>> more to this concer.
>>
>>
>> I have looked at /var/log/syslog not finding a clear trace of the
>> failure. The machine was not used today and all of today in syslog
>> relates to 28 April 19.56-19.57.
>
> Well, during boot, until /var is mounted rw, nothing will appear in
> syslog.
>
>
> If you have a separate machine available (it doesn't have to be
> dedicated to this), and if you plan to reboot this problem machine soon,
> I'd set it up for serial console (boot messages going out the serial
> port instead of to the vga screen), and capture it with the other
> machine.
>
> In /boot/grub/menu.list, you'd add an altoptions line:
>
> # altoptions=(serial console) console=tty0 console=ttyS1,38400n8
>
> the first console command says to send info to tty0, the second to ttyS1
> (a serial port).  Check the docs for the order, this is for my server
> when I run it from another box and I need to talk to the boot process
> (for LUKS password), you may need the other order so that you can type
> on the tty0 console but have messages go to ttyS1.  Adjust the ttyS1 for
> whatever serial port you use and the speed, parity, and data bits (here
> 38400n8).
>
> Once you have things set up and working, which will involve rebooting
> the suspect machine, you'll see what happens.
>
> If it were me, I'd also schedule some downtime overnight on the box and
> run memtest (the memtest86+ package that installs into grub, or boot a
> live CD such as grml that includes memtest as a boot option).
>
> Good luck.
>
> Doug.
>
>
> --
> To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>


Reply to: