[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: smp problems on AlphaServer 4100


Thanks for you replies. I fell off the map for a bit because of some family issues. I am finally getting to look at this again.

--On Friday, January 27, 2006 04:42:45 PM -0800 Steve Langasek <vorlon@debian.org> wrote:

On Thu, Jan 26, 2006 at 10:47:22PM -0800, Bill MacAllister wrote:
I probably would not be futzing around with this system at all if it was
stable.  Even though I can boot a 2.4.28 smp kernel and a 2.6.8 generic
kernel, neither is stable.  The system kernel panicked in the middle of
the  day again today.

With which kernel?  FWIW, kernel panics in 2.6.8 are probably not going to
get much attention unless they're widespread, as most efforts are
(naturally) focused on the next release; but if you can forward us a
backtrace from these kernel panics (sent through ksymoops as needed), we
might at least be able to suggest a workaround.  Otherwise, you could try
2.6.15 from unstable, to see if it addresses your bug -- or press for it
to be fixed for etch if not.

The kernel panics happened in both the 2.4.28 kernel I built and the stock Debian 2.6.8 generic kernel. Whichs makes one expect a hardware issues, i.e. something is broke. But, I have a VMS system sitting right next to this Linux system, and since both have StorageWorks shelves I have been able to swap drives between the two systems and the problem moves with the disks. I am seeing this problem on a AlphaServer 4100 (2.4.28 kernel and 2.6.8 kernel), a PWS 600au (2.4.28 kernel), and a DECServer 5000 (2.4.25 kernel). All of these systems have qLogic controllers. I am not seeing the problem on AlphaServer 1000a, PWS, and AlphaStation 300 all with qLogic controllers and running RedHat 7.2, 2.4.9 kernels.

I agree that debugging old software does not sound useful, so I will work on building a current kernel. This usually takes me a couple of iterations to get the devices right. I guess I could use the kernel from etch, but I would only want the kernel from unstable. I am not enough of an apt-get expert to know how to get only one package from unstable. Hmmm, maybe I should just dpkg install the .deb for the kernel?

I don't know how to get a backtrace. The system is really locked up when it panics. Pointing me to some documentation would be sufficient.

FWIW, I think your best bet is going to be the -generic kernel, not the
-smp; ISTR hearing that the Debian alpha autobuilders also had trouble
running smp kernels, and my own experience trying to boot an SMP kernel
(by accident on a UP system) was a very odd kernel panic on boot.

I have seen notes about qLogic problems.  This system now has two qLogic
controllers in it.  I don't have log files or debugging output it does
seem  like this is a qLogic problem.  Currently the system seems to have
problems  when it gets busy.  I thought that with 2.6 the driver was
supposed to be  supported again.

That could be a driver problem, or it could be a hardware problem (well,
more accurately, hardware problem + unhandled exception in the driver...).
Have you tried swapping out the SCSI cables and/or terminators?

My experience with 2.6 so far is that it does far worse than 2.4 (i.e.,
won't load) with the QLA in my alpha.

After two crashes on a critical system yesterday I spent last night putting an Adaptec 2940uw controller in the AlphaServer 4100. This controller is not seen by SRM so I had to leave one of the qLogic controllers installed. I would have used an NCR based card, but I didn't have the right cable with me. It is a pain that the device order got reversed, i.e. /dev/sdb became /dev/sdf.

So, what are the options for SRM visible PCI controllers?  With a 2.4
kernel I had boot problems with a TekRam controller I tried.  Maybe I
should try the TekRam again in 2.6?

I tried the TekRam controller again. The 4100 hates it. Doing a SHOW DEVICE in SRM hangs.

As mentioned in my other mail, I have an Adaptec AHA-2940U/UW/D running
here that SRM sees just fine, to my very great surprise.

What system is this?

What other options are there?  I thought maybe it would make sense to
put  an Adaptec controller in the system and put the bulk of the data
out there  with a minimal amount of the system on qLogic served disk.
Or maybe an  AlphaBIOS boot?  It has been forever since I have looked at
AlphaBIOS on  these systems, but I remember that from AlphaBIOS you can
see Adaptec  controllers.  If I tried that I would have to use MILO to
boot, right?  Is  that reasonable to try?

If you don't mind the fact that MILO is unsupported by Debian and you'll
therefore have to roll your own install process if you ever need to
reinstall, yes, that's reasonable to try.

I never really liked MILO so I would be going there unless some one thinks it is a solution.

| Bill MacAllister
| 14219 Auburn Road
| Grass Valley, CA 95949
| 530-272-8555

Reply to: