[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ARAnyM VMs with Debian hanging at 100% CPU usage



Andreas Schwab dixit:

>Also, it looks like c663600584a596b5e66258cc10716fb781a5c2c9 is
>missing, that may be another thing to try.

Hm no, that wasn't it. But Geert, if you have such patches that
may make sense to apply, please forward them to linux-stable with
possibly a Cc to Ben Hutchings, as that's the easiest way for me
to get them. (I’ve attached a version of it rebased against the
Debian 3.2.23 kernel for convenience.)

I can reliably reproduce it on two VMs with 'apt-get update' now,
but 'strace -f apt-get update' doesn't seem to be affected (it’s
very slow) and since it shows four processes, I still believe it
to be some kind of threading issue. (We have some with Qt already,
remember?)

Console log shows:

[  284.170000] BUG: soft lockup - CPU#0 stuck for 22s! [https:1590]
[  284.170000] Modules linked in: ipv6
[  284.170000]
[  284.170000] Format 00  Vector: 0114  PC: 0000273c  Status: 2404    Not tainted
[  284.170000] ORIG_D0: ffffffff  D0: 27ecf039  A2: 30fb9600  A1: 28832000
[  284.170000] A0: 80020b90  D5: 00000000  D4: 2882ffcc
[  284.170000] D3: 00000100  D2: 80020b90  D1: 00000001

Is PC supposed to be that low, anyway?

bye,
//mirabilos
-- 
22:20⎜<asarch> The crazy that persists in his craziness becomes a master
22:21⎜<asarch> And the distance between the craziness and geniality is
only measured by the success 18:35⎜<asarch> "Psychotics are consistently
inconsistent. The essence of sanity is to be inconsistently inconsistent
From c663600584a596b5e66258cc10716fb781a5c2c9 Mon Sep 17 00:00:00 2001
From: Mikael Pettersson <mikpe@it.uu.se>
Date: Thu, 19 Apr 2012 00:53:36 +0200
Subject: [PATCH] m68k: Correct the Atari ALLOWINT definition

Booting a 3.2, 3.3, or 3.4-rc4 kernel on an Atari using the
`nfeth' ethernet device triggers a WARN_ONCE() in generic irq
handling code on the first irq for that device:

WARNING: at kernel/irq/handle.c:146 handle_irq_event_percpu+0x134/0x142()
irq 3 handler nfeth_interrupt+0x0/0x194 enabled interrupts
Modules linked in:
Call Trace: [<000299b2>] warn_slowpath_common+0x48/0x6a
 [<000299c0>] warn_slowpath_common+0x56/0x6a
 [<00029a4c>] warn_slowpath_fmt+0x2a/0x32
 [<0005b34c>] handle_irq_event_percpu+0x134/0x142
 [<0005b34c>] handle_irq_event_percpu+0x134/0x142
 [<0000a584>] nfeth_interrupt+0x0/0x194
 [<001ba0a8>] schedule_preempt_disabled+0x0/0xc
 [<0005b37a>] handle_irq_event+0x20/0x2c
 [<0005add4>] generic_handle_irq+0x2c/0x3a
 [<00002ab6>] do_IRQ+0x20/0x32
 [<0000289e>] auto_irqhandler_fixup+0x4/0x6
 [<00003144>] cpu_idle+0x22/0x2e
 [<001b8a78>] printk+0x0/0x18
 [<0024d112>] start_kernel+0x37a/0x386
 [<0003021d>] __do_proc_dointvec+0xb1/0x366
 [<0003021d>] __do_proc_dointvec+0xb1/0x366
 [<0024c31e>] _sinittext+0x31e/0x9c0

After invoking the irq's handler the kernel sees !irqs_disabled()
and concludes that the handler erroneously enabled interrupts.

However, debugging shows that !irqs_disabled() is true even before
the handler is invoked, which indicates a problem in the platform
code rather than the specific driver.

The warning does not occur in 3.1 or older kernels.

It turns out that the ALLOWINT definition for Atari is incorrect.

The Atari definition of ALLOWINT is ~0x400, the stated purpose of
that is to avoid taking HSYNC interrupts.  irqs_disabled() returns
true if the 3-bit ipl & 4 is non-zero.  The nfeth interrupt runs at
ipl 3 (it's autovector 3), but 3 & 4 is zero so irqs_disabled() is
false, and the warning above is generated.

When interrupts are explicitly disabled, ipl is set to 7.  When they
are enabled, ipl is masked with ALLOWINT.  On Atari this will result
in ipl = 3, which blocks interrupts at ipl 3 and below.  So how come
nfeth interrupts at ipl 3 are received at all?  That's because ipl
is reset to 2 by Atari-specific code in default_idle(), again with
the stated purpose of blocking HSYNC interrupts.  This discrepancy
means that ipl 3 can remain blocked for longer than intended.

Both default_idle() and falcon_hblhandler() identify HSYNC with
ipl 2, and the "Atari ST/.../F030 Hardware Register Listing" agrees,
but ALLOWINT is defined as if HSYNC was ipl 3.

[As an experiment I modified default_idle() to reset ipl to 3, and
as expected that resulted in all nfeth interrupts being blocked.]

The fix is simple: define ALLOWINT as ~0x500 instead.  This makes
arch_local_irq_enable() consistent with default_idle(), and prevents
the !irqs_disabled() problems for ipl 3 interrupts.

Tested on Atari running in an Aranym VM.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Michael Schmitz <schmitzmic@googlemail.com> (on Falcon/CT60)
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/m68k/include/asm/entry.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

--- a/arch/m68k/include/asm/entry.h
+++ b/arch/m68k/include/asm/entry.h
@@ -33,8 +33,8 @@
 
 /* the following macro is used when enabling interrupts */
 #if defined(MACH_ATARI_ONLY)
-	/* block out HSYNC on the atari */
-#define ALLOWINT	(~0x400)
+	/* block out HSYNC = ipl 2 on the atari */
+#define ALLOWINT	(~0x500)
 #define	MAX_NOINT_IPL	3
 #else
 	/* portable version */

Reply to: