aha1542 kernel panics
I have been trying to solve this problem without success for about a year
now.
My hardware is:
- Dell NetPlex 486SX/25
- 32MB RAM
- AHA1542C SCSI Controller
- 3COM 509B Ethernet Controller
- 2-port 16550A Serial Card
- 3 x SCSI Hard Drives External
- 1 x SCSI Jaz Drive External
- 1 x SCSI Sony DDS-3 DAT Drive External
My software is:
- Debian GNU/Linux 2.1 (slink)
- Glibc 2.1 added from potato
- Linux 2.2.11 kernel
My problem is:
After the system has been up for a random length of time (usually about a
week or so) it will crash in the middle of the night during a full backup
to the DAT drive using cpio. The machine hangs in either an infinite loop
or a kernel panic. I originally was running Debian 2.1 with a 2.0.36
kernel, and I would see the following scrolling endlessly off the screen
after a crash:
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
Sending SCSI DID_RESET...
other scsi messages, etc...
Since installing the 2.2 kernel and associated upgraded packages as
detailed in the errata for slink, the crashes *seem* to occur less often,
but this morning I saw:
aha1542_out failed...
aha1542_out failed... failed to reset target...
...
Kernel panic: unable to find empty mailbox for aha1542...
and the system was locked up. Since upgrading to the 2.2 kernel, I also
notice periodic messages in the syslog (about one per day) like this:
aha1542.c: interrupt received but no mail
The system will run perfectly for a week or so, doing this same backup
routine every night, and then it will just pull this trick on some random
night.
I have tried:
- disconnecting all devices except the tape drive hard drives
- installing the highest quality cables I can find for the external
devices (this machine currently has about $400 US worth of Granite
Digital cables hanging off of it).
- installing a Granite Digital active terminator on the end of the SCSI
chain
- verifying that there are no interrupt or IO port confilicts both in the
device jumper configurations and from the /proc filesystem
I am completely at my wits end with this. I have searched DejaNews
repeatedly for any discussions of kernel panics and crashes with Adaptec
cards, Linux, SCSI in general, etc., and all I can find is one thread
from about a year ago mentioning the same sorts of problems but no
solution.
Is this a problem that anyone else has ever had with Linux and an
AHA1542C in particular or SCSI in general? Can anyone recommend which
part of the setup I should change or eliminate? Is it a bad card? Are
Adaptec cards bad in general? Is the aha1542 scsi driver problematic? Is
Linux SCSI in general problematic?
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Reply to: