[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#494365: 2.6.26 hangs on opteron CPUs



adding stable@kernel.org on cc for
8004dd965b13b01a96def054d420f6df7ff22d53

On Mon, Aug 11, 2008 at 11:17:34PM -0600, dann frazier wrote:
> On Fri, Aug 08, 2008 at 08:51:33PM +0200, Peter Palfrader wrote:
> > Package: linux-image-2.6.26-1-amd64
> > Version: 2.6.26-1
> > Severity: important
> > 
> > Hi,
> > 
> > it seems that 2.6.26 (whether the debian package or the kernel.org
> > kernel) locks up after a while on Debian's DL385G1 systems.
> > 
> > After a while, sooner with more disk IO/filesystem load, the system
> > hangs: it continues to do stuff but everything involving disk hangs
> > forever.
> > 
> > The systems work just fine on a 2.6.25.10 kernel.
> > 
> > The servers have Opterons like this:
> > cpu family      : 15
> > model           : 33
> > 
> > so http://www.uwsg.iu.edu/hypermail/linux/kernel/0808.0/0882.html might
> > explain it.
> 
> hey Peter,
>  This is readily reproducible - a simple kernel compile was all it
> took. git bisecting suggests that this issue was introduced by [1]
> and unmasked by [2] during 2.6.26 devlopment. It was later fixed
> during 2.6.27 development by [3].
> 
> Can you confirm that the attached backport of [3] fixes the problem
> for you?
> 
> [1] 35605a1027ac630f85a1b95684f7e86b82498cd6
> [2] 8d539108560ec121d59eee05160236488266221c
> [3] 8004dd965b13b01a96def054d420f6df7ff22d53
> 
> 
> -- 
> dann frazier
> 

> commit 8004dd965b13b01a96def054d420f6df7ff22d53
> Author: Yinghai Lu <yhlu.kernel@gmail.com>
> Date:   Mon May 12 17:40:39 2008 -0700
> 
>     x86: amd opteron TOM2 mask val fix
>     
>     there is a typo in the mask value, need to remove that extra 0,
>     to avoid 4bit clearing.
>     
>     Signed-off-by: Yinghal Lu <yhlu.kernel@gmail.com>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> Backported to Debian's 2.6.26 by dann frazier <dannf@hp.com>
> 
> diff -urpN linux-source-2.6.26.orig/arch/x86/kernel/cpu/mtrr/generic.c linux-source-2.6.26/arch/x86/kernel/cpu/mtrr/generic.c
> --- linux-source-2.6.26.orig/arch/x86/kernel/cpu/mtrr/generic.c	2008-08-11 22:55:59.000000000 -0600
> +++ linux-source-2.6.26/arch/x86/kernel/cpu/mtrr/generic.c	2008-08-11 22:57:13.000000000 -0600
> @@ -219,7 +219,7 @@ void __init get_mtrr_state(void)
>  		tom2 = hi;
>  		tom2 <<= 32;
>  		tom2 |= lo;
> -		tom2 &= 0xffffff8000000ULL;
> +		tom2 &= 0xffffff800000ULL;
>  	}
>  	if (mtrr_show) {
>  		int high_width;
> diff -urpN linux-source-2.6.26.orig/arch/x86/pci/k8-bus_64.c linux-source-2.6.26/arch/x86/pci/k8-bus_64.c
> --- linux-source-2.6.26.orig/arch/x86/pci/k8-bus_64.c	2008-08-11 22:55:59.000000000 -0600
> +++ linux-source-2.6.26/arch/x86/pci/k8-bus_64.c	2008-08-11 22:57:13.000000000 -0600
> @@ -384,7 +384,7 @@ static int __init early_fill_mp_bus_info
>  	/* need to take out [0, TOM) for RAM*/
>  	address = MSR_K8_TOP_MEM1;
>  	rdmsrl(address, val);
> -	end = (val & 0xffffff8000000ULL);
> +	end = (val & 0xffffff800000ULL);
>  	printk(KERN_INFO "TOM: %016lx aka %ldM\n", end, end>>20);
>  	if (end < (1ULL<<32))
>  		update_range(range, 0, end - 1);
> @@ -478,7 +478,7 @@ static int __init early_fill_mp_bus_info
>  		/* TOP_MEM2 */
>  		address = MSR_K8_TOP_MEM2;
>  		rdmsrl(address, val);
> -		end = (val & 0xffffff8000000ULL);
> +		end = (val & 0xffffff800000ULL);
>  		printk(KERN_INFO "TOM2: %016lx aka %ldM\n", end, end>>20);
>  		update_range(range, 1ULL<<32, end - 1);
>  	}




Reply to: