[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: smp problems on AlphaServer 4100



Bill MacAllister wrote:
> I probably would not be futzing around with this system at all if it was
> stable.  Even though I can boot a 2.4.28 smp kernel and a 2.6.8 generic
> kernel, neither is stable.  The system kernel panicked in the middle of
> the day again today.  I have been fighting stability problems since I
> left the last official Red Hat release, 7.2.

I have read on news (likely comp.os.linux.alpha) that RH 7.2 may be the only 'known
to be be stable' system for Alpha.

> This system sits next to PWS 600 running OpenVMS.  Last summer in
> frustration I swapped the disks and booted the 4100 as a VMS system and
> the PWS as a Linux system.  The instability problems followed the
> software. The only thing that seemed to bring the VMS system down was
> when the power failed a couple of times.  Not to say I haven't seen VMS
> systems accvio themselves into oblivion, but at least you can set them
> to restart themselves when they get lost in the weeds.  It would be
> helpful if Linux would report the problem and then restart itself. 
> Maybe this is a option I don't know about?

For kernel panics at levels 1 and 2, you have option
panic=30
which actually means:
"when I panic, and detect I panic, I will reboot after 30s".
Default setting is panic=0, in the 'usual' Linux meaning: 0=never

(yahoo, I am writing more and more english funny sentences these days)

You may also try one or an other 'kernel hacking option', since they allow to gat
hand back on panicked system, but I forgot whether they are scriptable or not. Main
use is to umount /, and sync disks before rebooting (especially usefull not to loose
 disk cashes with XFS).

Real system freeses can only be cached back by hardware watchdog cards. Btw, I dont
have experience of freeses on Alpha since I got Alpha only since 6 months.

> Then I thought that I had figured the problem several months ago when I
> noticed that moving the old 8 gig disks on to a separate controller from
> the 70 and 140 gig disks seemed to work better.  But, it only got better
> never really stable.  Still kernel panics now and again.

Note that for me AS1200 handle a better way system crashes than 4100. My 4100 (even
after SRM latest upgrade to 6.0) may not automatically reboot after kernel 'left
over', when 1200 ALWAYS did it for me (even from 5.1).

If you use DAC960 or any chipset from the familly, it is VERY HIGHLY recommended you
upgrade it also. [snip of the long story]

> I have seen notes about qLogic problems.  This system now has two qLogic
> controllers in it.  I don't have log files or debugging output it does
> seem like this is a qLogic problem.  Currently the system seems to have
> problems when it gets busy.  I thought that with 2.6 the driver was
> supposed to be supported again.

to me, 2.6 driver is an untouched fork of a 2.4 one: people seemed to copy the file
without actually paching it against 2.6, cf source. From my experience, I would say
that the driver has so many issues and interractions that Linux may hang without
being able to tell it hanged due to QLA10xx-isp driver.

If you have serial cable, comunity would have large benefits of kmesg over serial
line, so that developpers could see what around your system.

Every body asked me that, but I have issues with my bitrate between Alphas.

> So, what are the options for SRM visible PCI controllers?  With a 2.4
> kernel I had boot problems with a TekRam controller I tried.  Maybe I
> should try the TekRam again in 2.6?

The sym22802 works very well with Linux, but from memory, I dont think SRM can boot
it. I sugets you remove all ISP10xx, and find a way to put bootsector on an
alternative chain. AS1200 can be tweeked to have a hard drive on internal
controller, by side or by replacement of CDs. Also consider putting a 'small kernel'
on floppy, just SCSI and SYM driver, mount HDD, then use a kernel loader to switch
to a better suitted one, or maybe put your kernel on CD-RW if you are good with
el-torrito sectors.

Silicon Image 3114 SATA is known to be usefull on AS4100, but my logs dont tell if
it is bootable from SRM.

> What other options are there?  I thought maybe it would make sense to
> put an Adaptec controller in the system and put the bulk of the data out
> there with a minimal amount of the system on qLogic served disk.  Or
> maybe an AlphaBIOS boot?  It has been forever since I have looked at
> AlphaBIOS on these systems, but I remember that from AlphaBIOS you can
> see Adaptec controllers.  If I tried that I would have to use MILO to
> boot, right?  Is that reasonable to try?

no, use kernel option
console=ttyS0,other,options ...
and wire serial ports of two systems. Receiver system can simply do:
cat /dev/ttyS0 >>/var/log/myotheralpha
You can then use
tail -f
or
tee
to view it on screen.

-- 
DEMAINE Benoit-Pierre (aka DoubleHP ) http://www.demaine.info/
\_o< If computing were an exact science, IT engineers would not have work >o_/



Reply to: