Weird SMP problem
Hi!
Imagine the following configurations:
machine1:
- 2 x Pentium III 800MHz
- CUV266-D Asus motherboard (VIA VT8633/8233)
- 1 GB DDR RAM
- SCSI storage controller: Adaptec 7892A
- 2 x IBM HDs (Model: DDYS-T09170N)
machine2:
- the same CPUs, motherboard and RAM as in machine1
- FUJITSU MPF3153AH, ATA DISK drive
Both machine1 and machine2 run Linux 2.4.17 SMP
machine1 runs Debian potato + Adrian Bunk's packages needed to run
2.4.x kernel + a bit patched version of qmail
machine2 runs Debian woody + the same version of qmail
machine2 runs well (we did some stress tests like injecting a few
thousand of messages into qmail and compiling the kernel with -j 2)
As for machine1, it boots nicely, switches into runlevel 2 and then,
about 4 or five seconds after qmail starts - freezes completly (not even
keyboard LEDs blink).
This is the weirdest thing about that. We started it in single user mode,
it fsckd all filesystems, we deleted all links in /etc/rc2.d, removed
/etc/nologin*, proceeded to runlevel 2 and then manually started the
services one-by-one, waiting a minute or so after each one started to
check if the machine is still responsive. And again it freezed
a few seconds after starting qmail (while the disks were still churning as it
processed its queue).
The SCSI controller and disks _are_ ok, since it has ran flawlessly on a
non-SMP system for some year or so (and actually still runs as I type
these words).
What might be causing this??? I don't think the software version
difference is relevant, since only a hardware or kernel malfunction
should be able to freeze a system, right?
Then again, the kernel is the same (from the same package).
Ideas on what might be wrong or how to further isolate the problem
are very welcome.
Marcin
--
Marcin Owsiany
porridge@expro.pl
Reply to: