Re: Weird SMP problem
On Thu, 31/Jan/02 19:23:08, Marcin Owsiany wrote:
> Hi!
>
> Imagine the following configurations:
>
> machine1:
> - 2 x Pentium III 800MHz
> - CUV266-D Asus motherboard (VIA VT8633/8233)
> - 1 GB DDR RAM
> - SCSI storage controller: Adaptec 7892A
> - 2 x IBM HDs (Model: DDYS-T09170N)
>
> machine2:
> - the same CPUs, motherboard and RAM as in machine1
> - FUJITSU MPF3153AH, ATA DISK drive
>
>
> Both machine1 and machine2 run Linux 2.4.17 SMP
>
> machine1 runs Debian potato + Adrian Bunk's packages needed to run
> 2.4.x kernel + a bit patched version of qmail
>
> machine2 runs Debian woody + the same version of qmail
>
>
>
> machine2 runs well (we did some stress tests like injecting a few
> thousand of messages into qmail and compiling the kernel with -j 2)
>
> As for machine1, it boots nicely, switches into runlevel 2 and then,
> about 4 or five seconds after qmail starts - freezes completly (not even
> keyboard LEDs blink).
>
> This is the weirdest thing about that. We started it in single user mode,
> it fsckd all filesystems, we deleted all links in /etc/rc2.d, removed
> /etc/nologin*, proceeded to runlevel 2 and then manually started the
> services one-by-one, waiting a minute or so after each one started to
> check if the machine is still responsive. And again it freezed
> a few seconds after starting qmail (while the disks were still churning as it
> processed its queue).
>
> The SCSI controller and disks _are_ ok, since it has ran flawlessly on a
> non-SMP system for some year or so (and actually still runs as I type
> these words).
>
>
> What might be causing this??? I don't think the software version
> difference is relevant, since only a hardware or kernel malfunction
> should be able to freeze a system, right?
>
> Then again, the kernel is the same (from the same package).
>
> Ideas on what might be wrong or how to further isolate the problem
> are very welcome.
>
> Marcin
> --
> Marcin Owsiany
> porridge@expro.pl
>
>
Hi,
If I were you I would do:
1) exchange disks between two machines,
2) remove one processor,
3) use two processors but recompile kernel with SMP switched off,
4) disconnect one disk,
5) run qmail with top (-d 1) on screen,
6) start without swap,
7) use diffrent controler (even if the old one seems good).
Have you done this? Is there any notices?
Think what else you can change in your hardware configuration (whatever - slow
down processors or use slower ones) to change a bit environment for software
and watch carefull to your system.
Regards,
Krzysztof
Reply to: