2.4.18-1-generic seems to have subtle flaws
I have uncovered subtle flaws in the Woody 2.4.18-1-generic kernel that
in the end caused me to switch over to the 2.4.18 kernel that HP puts
out with the RedHat/HP Linux 7.2 release they support.
Here is the basic story. I have a PC164 server that is IDE based and
has 3 network interface cards. The server is both a standard web,
mail, and print server, along with being a NAT, firewall, and router
with two internal subnets. I would experience very bizarre failures
that would seem to be hardware or kernel related. It seemed that if
the network traffic on the interfaces would go quiet for a period of
time, the machine would freeze up. Sometimes it would freeze up
silently. Other times it would come alive with the arrival of network
traffic. There would be no error messages, but if I looked at various
logs carefully, I noticed that the machine had stopped and that MARKs
in the /var/log/message file were missing. The kernel clock seemed to
freeze and then speed back up when the machine awoke and NTP pushed
forward the time. Sometimes, when the machine froze up completely, I
would see the following errors in the log from named:
syslog.5.gz:Jul 6 21:58:16 xxx named: gettimeofday returned bad
I first assumed a hardware error. So I swapped out the machine with a
hot spare that was an identical configuration except that the 3 NIC
cards were now RTL8139s rather than 3COM 3c905s. I changed the hard
drive as well, so there was no hardware that was moved from the first
to the second machine, only software. The second machine failed as
well, only this time it would freeze up less often and instead the
Ethernet interfaces would just stop working. If I manually ifdown and
ifup the interfaces from the console, they would start working again.
But if I got the tv_usec error message from named in the logs, the
machine would lock up.
Since two separate machines had the same problems, I declared it a
kernel problem. In the end, I gave up on the 2.4.18 kernel in Woody.
I unpacked the 2.4.18 kernel RPM for the RedHat/HP release and I'm
running that instead. It would seem that there is one or more issues
with the Woody 2.4.18 kernel. Either the kernel doesn't have all the
right Alpha patches that the HP team puts into their kernel and/or it
is a mistake to compile the Woody kernel with gcc 3.2 if it hasn't been
fixed for the Alpha. I remember a number of message in the past on the
HP/Redhat Linux mailing list that said the gcc 3.x compiler for the
Alpha was broken and not ready for production use. The HP team
compiles their kernel with a patched gcc 2.96.