[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#572442: sparc 2.6.29+ NMI watchdog deadlock on Sun Fire V240 etc



Package: linux-2.6
Severity: serious
Tags: upstream patch

Hi there,

Ever since kernel 2.6.29 came out, several classes of sparc machines have
been unable to upgrade, because they would get stuck while initializing
the new NMI watchdog code.

The process of trying to figure it out is mostly documented in this
long-running mailing list thread that spanned many months:
http://lists.debian.org/debian-sparc/2009/08/msg00005.html
http://lists.debian.org/debian-sparc/2009/09/msg00018.html
http://lists.debian.org/debian-sparc/2009/10/msg00015.html
http://lists.debian.org/debian-sparc/2009/11/msg00034.html
http://lists.debian.org/debian-sparc/2009/12/msg00000.html

Had this gone unattended, sparc release requalification might have been in
trouble, because the bug affects the Fire V240 sparc buildd machines as well
as Jurij Smakov's test machine, and that's a lot in our little universe :)

Fortunately David Miller came to the rescue and personally debugged the
problem on one of the buildds, and fixed the problem. His solution, that
we are currently running on schroeder.debian.org, is attached.

Please include the patch in the sparc kernel package so that we can test
it widely, preferably ASAP. TIA.

----- Forwarded message from David Miller <davem@davemloft.net> -----

Date: Wed, 03 Mar 2010 09:11:41 -0800 (PST)
Subject: Re: Sparc release requalification


Ok, I think I fixed it.

Attached are two versions of the fix, the first attachment is
for 2.6.33 and the second one is for any kernel 2.6.32 and
previous.

Give it a good test on any machine you've seen this problem on
and let me know how it goes.

Thanks.

>From 8a4fd1e4922413cfdfa6c51a59efb720d904a5eb Mon Sep 17 00:00:00 2001
From: David S. Miller <davem@davemloft.net>
Date: Wed, 3 Mar 2010 09:06:03 -0800
Subject: [PATCH] sparc64: Make prom entry spinlock NMI safe.

If we do something like try to print to the OF console from an NMI
while we're already in OpenFirmware, we'll deadlock on the spinlock.

Use a raw spinlock and disable NMIs when we take it.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/prom/p1275.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/prom/p1275.c b/arch/sparc/prom/p1275.c
index 4b7c937..2d8b70d 100644
--- a/arch/sparc/prom/p1275.c
+++ b/arch/sparc/prom/p1275.c
@@ -32,10 +32,9 @@ extern void prom_cif_interface(void);
 extern void prom_cif_callback(void);
 
 /*
- * This provides SMP safety on the p1275buf. prom_callback() drops this lock
- * to allow recursuve acquisition.
+ * This provides SMP safety on the p1275buf.
  */
-DEFINE_SPINLOCK(prom_entry_lock);
+DEFINE_RAW_SPINLOCK(prom_entry_lock);
 
 long p1275_cmd(const char *service, long fmt, ...)
 {
@@ -47,7 +46,9 @@ long p1275_cmd(const char *service, long fmt, ...)
 	
 	p = p1275buf.prom_buffer;
 
-	spin_lock_irqsave(&prom_entry_lock, flags);
+	raw_local_save_flags(flags);
+	raw_local_irq_restore(PIL_NMI);
+	raw_spin_lock(&prom_entry_lock);
 
 	p1275buf.prom_args[0] = (unsigned long)p;		/* service */
 	strcpy (p, service);
@@ -139,7 +140,8 @@ long p1275_cmd(const char *service, long fmt, ...)
 	va_end(list);
 	x = p1275buf.prom_args [nargs + 3];
 
-	spin_unlock_irqrestore(&prom_entry_lock, flags);
+	raw_spin_unlock(&prom_entry_lock);
+	raw_local_irq_restore(flags);
 
 	return x;
 }
-- 
1.6.6.1


sparc64: Make prom entry spinlock NMI safe.

If we do something like try to print to the OF console from an NMI
while we're already in OpenFirmware, we'll deadlock on the spinlock.

Disable NMIs when we take it.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/sparc/prom/p1275.c b/arch/sparc/prom/p1275.c
index 4b7c937..815cab6 100644
--- a/arch/sparc/prom/p1275.c
+++ b/arch/sparc/prom/p1275.c
@@ -32,8 +32,7 @@ extern void prom_cif_interface(void);
 extern void prom_cif_callback(void);
 
 /*
- * This provides SMP safety on the p1275buf. prom_callback() drops this lock
- * to allow recursuve acquisition.
+ * This provides SMP safety on the p1275buf.
  */
 DEFINE_SPINLOCK(prom_entry_lock);
 
@@ -47,7 +46,9 @@ long p1275_cmd(const char *service, long fmt, ...)
 	
 	p = p1275buf.prom_buffer;
 
-	spin_lock_irqsave(&prom_entry_lock, flags);
+	raw_local_save_flags(flags);
+	raw_local_irq_restore(PIL_NMI);
+	spin_lock(&prom_entry_lock);
 
 	p1275buf.prom_args[0] = (unsigned long)p;		/* service */
 	strcpy (p, service);
@@ -139,7 +140,8 @@ long p1275_cmd(const char *service, long fmt, ...)
 	va_end(list);
 	x = p1275buf.prom_args [nargs + 3];
 
-	spin_unlock_irqrestore(&prom_entry_lock, flags);
+	spin_unlock(&prom_entry_lock);
+	raw_local_irq_restore(flags);
 
 	return x;
 }


----- End forwarded message -----

-- 
     2. That which causes joy or happiness.



Reply to: