[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: smp problems on AlphaServer 4100



I probably would not be futzing around with this system at all if it was stable. Even though I can boot a 2.4.28 smp kernel and a 2.6.8 generic kernel, neither is stable. The system kernel panicked in the middle of the day again today. I have been fighting stability problems since I left the last official Red Hat release, 7.2.

This system sits next to PWS 600 running OpenVMS. Last summer in frustration I swapped the disks and booted the 4100 as a VMS system and the PWS as a Linux system. The instability problems followed the software. The only thing that seemed to bring the VMS system down was when the power failed a couple of times. Not to say I haven't seen VMS systems accvio themselves into oblivion, but at least you can set them to restart themselves when they get lost in the weeds. It would be helpful if Linux would report the problem and then restart itself. Maybe this is a option I don't know about?

Then I thought that I had figured the problem several months ago when I noticed that moving the old 8 gig disks on to a separate controller from the 70 and 140 gig disks seemed to work better. But, it only got better never really stable. Still kernel panics now and again.

I have seen notes about qLogic problems. This system now has two qLogic controllers in it. I don't have log files or debugging output it does seem like this is a qLogic problem. Currently the system seems to have problems when it gets busy. I thought that with 2.6 the driver was supposed to be supported again.

So, what are the options for SRM visible PCI controllers? With a 2.4 kernel I had boot problems with a TekRam controller I tried. Maybe I should try the TekRam again in 2.6?

What other options are there? I thought maybe it would make sense to put an Adaptec controller in the system and put the bulk of the data out there with a minimal amount of the system on qLogic served disk. Or maybe an AlphaBIOS boot? It has been forever since I have looked at AlphaBIOS on these systems, but I remember that from AlphaBIOS you can see Adaptec controllers. If I tried that I would have to use MILO to boot, right? Is that reasonable to try?

I certainly wish I had any experience at looking at driver level code. It is really frustrating to have such fine hardware, even if a bit dated, be unusable because of this one software problem when the rest of the system seems to work fine.

Bill

--On Friday, January 27, 2006 12:51:52 AM +0100 DEMAINE Benoit-Pierre <benoit@demaine.info> wrote:

Bill MacAllister wrote:
Yes, it does look similar and this system has two qLogic cards in it.  I
won't get a chance to try and build a kernel myself on this system for a
couple of days at least.  (My wife was relatively ticked off that I
didn't get home until 5am.)  One thing that does seem different is that
the boot failures that I was ending up with were not kernel panics, but
file not found before the boot really got rolling.  Although I am sure I
saw a kernel panic once or twice in the seemingly thousands of boots I
tried last night.

'file not found' is an end message that is due to a lack of 'storage
room': ramdisk.

In our case, I would not stuck to 'file not found', but to 'why could not
the kernel find the ramdisk room ? ... I meant, cant using SMP, and can
using Generic !

my /non answer/ is:
some thing /else/ prevents your kernel working fine, and on your system,
it ends with: 'cant load ramdisk driver'. Right, this driver may be
linked statically, BUT STILL, the kernel shall find it in its code,
detect, load, and initialise it.

For some people, we end up with no root, no initrd, for you with no
ramdisk, and for me I had already 'trying to kill init', 'cant find init
program' and ... 'killing interrupt handler' !!!

HOW THE HELL COULD A KERNEL KILL INTERRUPT HANDLER ???

and for all of us, if we use exactly the same conf, the same kernel conf
but without  SMP, or same binary kernel-SMP and manage to boot it without
ISP10x0 ... it works.

My answer is: QLA driver for ISP is 'very dably coded'.

Some people on IRC say that the actual version of QLA-isp in 2.6 is in
fact an 'untouched' fork of the driver from 2.4.21 (from memory, not sure
about 21).

And obviously, that one does not like the SMP part of 2.6.

My way to boot SMP kernel without QLA hardware requires SCSI adaptors:
in AS1200, find a [50/80] adapter to SCA (the passive adapter), then use
it to plug the internal SCSI plane to the mother board; now, I can not
boot CDs any more, but HDDs can be booted under dka0 (instead of dkb0 );
dont forget to force manually root= argument so that mount/remount can
'keep' a '/'. Still, this shown me that it is possible to go /further/ in
boot process when ISP Card is out; put it back (just the card, not
putting back any SCSI chain on it), and you hang way earlier. Take it
out, you go further. Of course, by the way, the chain bit rate gets a bit
lowered ^^

It costs an adapter, but it proffs where the problem is: ISP10x0 chipset
present in computer.

!!! This bug is also reproductible with intel SMP Debian kernels !!!

insert the card in a working intel station: wont boot any more.

Though, it still is hard for me to believe it isn't just something that
I am doing wrong.  I have only a fuzzy idea of how things should look in
/ and /boot for this to work.  I saw a not that indicated that when you
have / and /boot on different partition you need to make sure the links
in /boot are okay and the links from / to ./boot don't matter.  But,
apt-get is not creating any links in /boot only from / to /boot.  Maybe
I should try another disk with a single partition on it and see what
happens.

Well, at least its nice to know that it was not just me having
problems.  I have been trying to get this system to 2.6 for awhile now.

Bill

This ISP thing is a know problem for people from irc.freenode.net
#gentoo-alpha and
# alpha, but they have actually bigger problem with other chipsets not
# working at all
with 2.4 and 2.6.

--
DEMAINE Benoit-Pierre (aka DoubleHP ) http://www.demaine.info/
\_o< If computing were an exact science, IT engineers would not have work
>o_/


--
To UNSUBSCRIBE, email to debian-alpha-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact
listmaster@lists.debian.org




+----------------------------------------------------------
| Bill MacAllister, System Manager
| Nevada City School District



Reply to: