Problems with serial ports? (longish life story)
I've been running Debian 1.2 for quite a while now and it's been solid as a
rock - the system is an isa486 dx4/100 with 32 meg ram. It's used as one
of my dialup servers for clients - most of which run ppp. It's got 16
serial ports - 1 8port PC Com 16550 card, a 4port 16550 card and a Stallion
Brumby 4 port card.
Last Monday I came into work and found that all the serial ports on the
machine weren't working. The Brumby card was giving errors of 'not
responding' and the normal 16550's were all giving INIT respawning errors
because they couldn't get to the modems. Looking closer, it seemed that
each stuffed line had an LED lit on the modem that isn't usually lit when
mgetty isn't running on it - either RTS or CTS, I can't remember which now.
The machine was fine - the only thing wrong was the serial ports. Keep in
mind that I was running 2.0.29 for over 30 days uptime previously with no
problems at all and the only reason why it would've been 30 days is becase
I upgrade kernels occasionally.
I tried shutting down and rebooting a few times to no avail - the cards
were detected fine, but I couldn't access the modems. Instead of trying to
figure out and fault find the problem I took the easiest way out by
'pretending' all those serial cards were dead. I took them out and
inserted a Stallion EasyConnect 8/32 card and plugged in a couple of 8 port
modules to make 16 serial ports. I then compiled 2.0.30 to include the
Stallion card as a module, rebooted, and presto, I had instant serial ports
- and they worked. I figured that I'd spend some time on another machine
inserting the 'faulty' cards to see if I could make them error.
However, this was only the start of the problem. The next day I got errors
like this to /var/log/messages:
May 22 20:17:01 orion kernel: STALLION: bad RX interrupt ack value=f9
May 25 07:52:12 orion kernel: STALLION: cd1400 device not responding,
port=3 panel=1 brd=0
As soon as these messages poped up there I knew that the card was dead and
I had no access to the serial ports again - just like the other cards!
Also, the above errors aren't always the case, sometimes is might be a
different 'port' or 'ack value'. I shutdown the system and rebooted (all
done remotely - no power off) and the card has come up fine and worked for
around 24 hours or so.
This same problem has happened nearly everyday for the past week now. Each
time it does it, I rebooted and it's fine again for another day or so.
I've been trying different kernels too - I've gone from 2.0.30 to 2.0.29
and now, 2.0.28 which is just happened with 30mins ago as well. Each time,
a reboot fixes it.
I've checked all interrupts and IO addresses - all appear fine with no
The machine _was_ working fine for the past few months and the only things
I've done were update debian packages (stable only) and linux kernels. I
log all my system updates and changes I do so I can review it in case
anything goes wrong at a later date - the last change I made was on the 4th
of May, that being:
Preparing to replace quota 1.55-4 (using .../admin/quota_1.55-8.deb) ...
Preparing to replace at 2.9b-1 (using .../admin/at_3.1.4-2.deb) ...
Preparing to replace util-linux 2.5-9 (using
Preparing to replace kbd 0.92-3 (using .../base/kbd_0.92-3.1.deb) ...
Preparing to replace e2fsprogs 1.09-1 (using .../base/e2fsprogs_1.10-2.deb)
Preparing to replace qpopper 2.2-3 (using .../mail/qpopper_2.2-4.deb) ...
But keep in mind that there's a good 2 weeks between the last updates and
the problem at hand. So I doubt that could've had anything to do with it.
I'm about to try compiling the Stallion driver directly into the kernel
instead of a module to see what happens, but I doubt this'll have any
effect. I also think it could be a hardware problem, but if so, what could
it possibly be? Here's the complete system configuration (for those who
are still reading and havent lost interest):
Device DMA IRQ Ports
SMC 15 0300-031f (isa SMC Elite 16C Ultra network card)
Stallion 10 (Stallion EC8/32 serial card)
aha1542 5 11 0330-0333 (Adaptec 1542CF scsi card)
cascade 4 2
floppy 03f0-03f5 03f7-03f7
keyboard 1 0060-006f
serial 0360-0361 0380-039f (Stallion serial port
timer 0 0040-005f
Any of you have any ideas of how to stop me going bald? (don't take that
Tower Networking Pty Ltd Tel: +61-8-9456-0000 firstname.lastname@example.org
t/a STAR Online Services Fax: +61-8-9455-2776 email@example.com
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
Trouble? e-mail to firstname.lastname@example.org .