Bug#493479: linux-image-2.6.26-1-amd64: Problem possibly traced to RCU-related programming error

To: Debian Bug Tracking System <493479@bugs.debian.org>
Subject: Bug#493479: linux-image-2.6.26-1-amd64: Problem possibly traced to RCU-related programming error
From: Dave Witbrodt <dawitbro@sbcglobal.net>
Date: Fri, 08 Aug 2008 22:54:43 -0400
Message-id: <[🔎] 20080809025443.2165.38363.reportbug@localhost.localdomain>
Reply-to: Dave Witbrodt <dawitbro@sbcglobal.net>, 493479@bugs.debian.org
Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-1
Followup-For: Bug #493479


I need a Debian Kernel Team member to go to bat for me here!

After bisecting the kernel using git, as advised here on Monday, I have
spent some time trying to locate the source code in the kernel that causes
the freeze.

I submitted a post to the LKML on Monday, and received a couple of
responses on Tuesday.  I have continued to post more information there as I
have discovered it, but have gotten no more replies since those first two
on Tuesday.

Here's the rundown on what I've found:

1.  All 2.6.25* kernels work for me -- stock Debian kernels and custom
kernels alike -- but all 2.6.26* (and 2.6.27* from git) kernels freeze
fairly early during the boot process, before framebuffer drivers (like
VESA FB or UVESA FB) kick in.


2.  The 2.6.2[67]* kernels can be made to boot if kernel parameters are
used that disable the High Precision Event Timer ("hpet=disabled" or
"nohpet").


3.  Booting linux-image-2.6.26-1-amd64 with "debug initcall_debug" reveals
that the last function called before the freeze is called "inet_init()".


4.  Bisecting kernels, beginning with 2.6.25 as the first "good" kernel
and 2.6.26-rc4 as the first "bad" kernel, revealed the commit ID that
introduces the problem with freezes:

    commit 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f


5.  A diff showing the changes which introduced the problem can easily be
obtained by cloning Linus Torvalds' git tree and running:

    git diff 700efc1b9f6afe34caae231b87d129ad8ffb559f 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f

The results should be:

    diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
    index a8694a3..8b914a8 100644
    --- a/arch/x86/kernel/e820_64.c
    +++ b/arch/x86/kernel/e820_64.c
    @@ -229,8 +229,7 @@ unsigned long __init e820_end_of_ram(void)
     /*
      * Mark e820 reserved areas as busy for the resource manager.
      */
    -void __init e820_reserve_resources(struct resource *code_resource,
    -		struct resource *data_resource, struct resource *bss_resource)
    +void __init e820_reserve_resources(void)
     {
	    int i;
	    for (i = 0; i < e820.nr_map; i++) {
    @@ -245,21 +244,7 @@ void __init e820_reserve_resources(struct resource *code_resource,
       	        res->start = e820.map[i].addr;
		res->end = res->start + e820.map[i].size - 1;
		res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
    -		request_resource(&iomem_resource, res);
    -		if (e820.map[i].type == E820_RAM) {
    -			/*
    -			 * We don't know which RAM region contains kernel data,
    -			 * so we try it repeatedly and let the resource manager
    -			 * test it.
    -			 */
    -			request_resource(res, code_resource);
    -			request_resource(res, data_resource);
    -			request_resource(res, bss_resource);
    -#ifdef CONFIG_KEXEC
    -			if (crashk_res.start != crashk_res.end)
    -				request_resource(res, &crashk_res);
    -#endif
    -		}
    +		insert_resource(&iomem_resource, res);
	    }
     }

    diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
    index 187f084..e3cb3ea 100644
    --- a/arch/x86/kernel/setup_64.c
    +++ b/arch/x86/kernel/setup_64.c
    @@ -248,6 +248,7 @@ static void __init reserve_crashkernel(void)
				    (unsigned long)(total_mem >> 20));
		crashk_res.start = crash_base;
		crashk_res.end   = crash_base + crash_size - 1;
    +		insert_resource(&iomem_resource, &crashk_res);
	    }
     }
     #else
    @@ -322,6 +323,11 @@ void __init setup_arch(char **cmdline_p)

	finish_e820_parsing();

    +	/* after parse_early_param, so could debug it */
    +	insert_resource(&iomem_resource, &code_resource);
    +	insert_resource(&iomem_resource, &data_resource);
    +	insert_resource(&iomem_resource, &bss_resource);
    +
	early_gart_iommu_check();

	e820_register_active_regions(0, 0, -1UL);
    @@ -454,7 +460,7 @@ void __init setup_arch(char **cmdline_p)
	    /*
	     * We trust e820 completely. No explicit ROM probing in memory.
	     */
    -	e820_reserve_resources(&code_resource, &data_resource, &bss_resource);
    +	e820_reserve_resources();
	e820_mark_nosave_regions();

	    /* request I/O space for devices used on all i[345]86 PCs */
    diff --git a/include/asm-x86/e820_64.h b/include/asm-x86/e820_64.h
    index 9e06c6e..ef653a4 100644
    --- a/include/asm-x86/e820_64.h
    +++ b/include/asm-x86/e820_64.h
    @@ -23,8 +23,7 @@ extern void update_memory_range(u64 start, u64 size, unsigned old_type,
     extern void setup_memory_region(void);
     extern void contig_e820_setup(void); 
     extern unsigned long e820_end_of_ram(void);
    -extern void e820_reserve_resources(struct resource *code_resource,
    -		struct resource *data_resource, struct resource *bss_resource);
    +extern void e820_reserve_resources(void);
     extern void e820_mark_nosave_regions(void);
     extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type);
     extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type);


6.  Inserting printk() function calls in the last function that does not
return at boot allows one to trace the failure deeper into the kernel
sources.  As mentioned above, the last function called when
linux-image-2.6.26-1-amd64 freezes is inet_init().  This function is found
in

    net/ivp4/af_inet.c

By placing informative printk() calls before each line of that function
that calls another function, I was able to narrow the problem down to one
of 2 loops:

    /* Register the socket-side information for inet_create. */
    for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
	    INIT_LIST_HEAD(r);

    for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
	    inet_register_protosw(q);

Further usage of printk()'s showed that the first loop completed
successfully, but the first iteration of the second loop dies in the
inet_register_protosw() call.


7.  The inet_register_protosw() function is located in the same file
(af_inet.c), and it fails in its last function call before the "return;"
statement: synchronize_net().  Please note that RCU features are involved
in this function, and the freeze seems to be caused by some mishandling of
RCU, as will be seen below.  I do not understand how the commit mentioned
above triggered the regression, but there is some connection between the
changes in that commit (3def3d...) and mishandling of RCU synchronization.


8.  The synchronize_net() function is located in

    net/core/dev.c

and is very small, containing only 2 function calls.  Use of printk()'s
reveals that the kernel freezes in the second function call,
synchronize_rcu().


9.  I am not a kernel hacker, and after reading some of the documentation
in

    Documentation/RCU

I decided against trying to trace the bug deeper into the kernel sources.
For one thing, the definition of synchronize_rcu() is somewhat masked using
preprocessor #define directives, which I took as a warning not to go
deeper.  Also, the documentation mentions that RCU issues very commonly
trip up kernel hackers themselves, so I quickly realized that there was
little chance of my figuring out anything useful by going deeper.

But I don't think the problem is deeper in the kernel sources, unless the
RCU functions themselves are broken.  (However, I have seen no mention of
any HPET-related function calls at the depths I have reached, and there is
some connection to HPET being enabled/disabled, so maybe the problem IS
deeper in the sources after all.)  It looks to me like some part of the
kernel code is misusing RCU -- and, according to the Documentation, those
kinds of mistakes are easy to make.  Maybe necessary calls to
rcu_read_lock()/rcu_read_unlock() are missing, and something about my
hardware is triggering a freeze that doesn't occur on most hardware.


10.  The freeze is somehow fixed by disabling HPET.  I was already confused
by the connection between commit 3def3d... and RCU, but the connection to
HPET is completely obscure to me.  Possibly disabling HPET causes RCU to
work differently internally, avoiding the freeze -- it seems that the
rcu_read_lock() function is used to provide a delay so that changes in
certain internal resources can be made without illegal accesses to
destroyed resources occuring due to synchronization complications.


11.  I have 2 machines with ECS AMD690GM-M2 motherboards which exhibit the
freeze with 2.6.2[67] kernels, while another machine I own, with a Gigabyte
GA-M59SLI-S5 motherboard, has no problem with those kernels at all.  This
apparently is a hardware related issue, or at least a BIOS issue.


It is possible that my posts to LKML were seen as useless noise, so that I
am now being ignored over there.  Maybe someone on the Debian Kernel Team
could pass this along directly to a Linux kernel maintainer who understands
how RCU works, and how HPET impacts on its functioning, and why that commit
might have triggered this regression.

I simply lack the knowledge to pursue this any further, but the new kernels
certainly will not work (without extra boot parameters) on any machine with
these ECS AMD690GM-M2 motherboards, and if there _is_ an RCU-related error
in the kernel sources, then it could lead to all sorts of other problems
down the road.

Thanks,
Dave Witbrodt
Reply to:
Prev by Date: Bug#489387: Please do not replace vserver with openvz.
Next by Date: Processed: Re: Processed (with 1 errors): Re: Bug#494363: thecus nic driver multicast issue
Previous by thread: Re: Bug#489387: Please do not replace vserver with openvz.
Next by thread: Processed: Re: Processed (with 1 errors): Re: Bug#494363: thecus nic driver multicast issue
Index(es):
- Date
- Thread