[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Weird SMP problem



Hi!

Imagine the following configurations:

machine1:
	- 2 x Pentium III 800MHz
	- CUV266-D Asus motherboard (VIA VT8633/8233)
	- 1 GB DDR RAM
	- SCSI storage controller: Adaptec 7892A
	- 2 x IBM HDs (Model: DDYS-T09170N)

machine2:
	- the same CPUs, motherboard and RAM as in machine1
	- FUJITSU MPF3153AH, ATA DISK drive


Both machine1 and machine2 run Linux 2.4.17 SMP

machine1 runs Debian potato + Adrian Bunk's packages needed to run
2.4.x kernel + a bit patched version of qmail

machine2 runs Debian woody + the same version of qmail



machine2 runs well (we did some stress tests like injecting a few
thousand of messages into qmail and compiling the kernel with -j 2)

As for machine1, it boots nicely, switches into runlevel 2 and then,
about 4 or five seconds after qmail starts - freezes completly (not even
keyboard LEDs blink).

This is the weirdest thing about that. We started it in single user mode,
it fsckd all filesystems, we deleted all links in /etc/rc2.d, removed
/etc/nologin*, proceeded to runlevel 2 and then manually started the
services one-by-one, waiting a minute or so after each one started to
check if the machine is still responsive. And again it freezed
a few seconds after starting qmail (while the disks were still churning as it
processed its queue).

The SCSI controller and disks _are_ ok, since it has ran flawlessly on a
non-SMP system for some year or so (and actually still runs as I type
these words).


What might be causing this??? I don't think the software version
difference is relevant, since only a hardware or kernel malfunction
should be able to freeze a system, right?

Then again, the kernel is the same (from the same package).

Ideas on what might be wrong or how to further isolate the problem
are very welcome.

Marcin
-- 
Marcin Owsiany
porridge@expro.pl



Reply to: