[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#502849: Possible fix



Executive summary: please try the r12453 sid snapshot once it appears.
apt repo lines are available on http://wiki.debian.org/DebianKernel

In the later report (Paul Hedderly) the erroring rip
(0xffffffff80227fe2) which corresponds to set_cpus_allowed_ptr+31 (or
0x1f out of 0xe0) is:

0xffffffff80227fda <set_cpus_allowed_ptr+23>:	callq  0xffffffff8022493b <task_rq_lock>
0xffffffff80227fdf <set_cpus_allowed_ptr+28>:	mov    %rax,%r13
0xffffffff80227fe2 <set_cpus_allowed_ptr+31>:	mov    (%rbx),%rax
0xffffffff80227fe5 <set_cpus_allowed_ptr+34>:	and    $0xffffffffffffffff,%eax
0xffffffff80227fe8 <set_cpus_allowed_ptr+37>:	test   %rax,0x3e5819(%rip)        # 0xffffffff8060d808 <cpu_online_map>

The fault is on the address in %rbx (0xffffffffff5f7000).

0xffffffff80227fe2 is in set_cpus_allowed_ptr (kernel/sched.c:5628).
5623		unsigned long flags;
5624		struct rq *rq;
5625		int ret = 0;
5626	
5627		rq = task_rq_lock(p, &flags);
5628		if (!cpus_intersects(*new_mask, cpu_online_map)) {
5629			ret = -EINVAL;
5630			goto out;
5631		}
5632	

I believe %rbx is new_mask. In the earlier two reports (both Andrea
Janna's) the erroring rip (0xffffffff80228045) doesn't precisely match
this but says it set_cpus_allowed_ptr+0x1f/0xe0 which is the same as in
Paul's report so I think it safe to say the versions we're simply linked
slightly differently and it's the same instruction.

The caller of set_cpus_allowed_ptr was
":processor:acpi_processor_get_throttling+0x45/0x6a" which is
0x00000000000004fa <acpi_processor_get_throttling+64>:	callq  0x4ff <acpi_processor_get_throttling+69>
0x00000000000004ff <acpi_processor_get_throttling+69>:	mov    %rbx,%rdi
(odd address since this is an unlinked .ko file)

0x4fa is in acpi_processor_get_throttling (drivers/acpi/processor_throttling.c:841).
836			return -ENODEV;
837		/*
838		 * Migrate task to the cpu pointed by pr.
839		 */
840		saved_mask = current->cpus_allowed;
841		set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
842		ret = pr->throttling.acpi_processor_get_throttling(pr);
843		/* restore the previous state */
844		set_cpus_allowed_ptr(current, &saved_mask);
845	

So this suggests that &cpumask_of_cpu(pr->id) is somehow bogus.

pr came from acpi_processor_start() via
acpi_processor_get_throttling_info(). Just before the call to
acpi_processor_get_throttling_info() in acpi_processor_start() we see:

        #ifdef CONFIG_XEN
                BUG_ON(pr->acpi_id >= NR_ACPI_CPUS);
                if (processor_device_array[pr->acpi_id] != NULL &&
                    processor_device_array[pr->acpi_id] != device) {
        #else
                if (processor_device_array[pr->id] != NULL &&
                    processor_device_array[pr->id] != device) {
        #endif /* CONFIG_XEN */
                        printk(KERN_WARNING "BIOS reported wrong ACPI id "
                                "for the processor\n");
                        return -ENODEV;
                }
        #ifdef CONFIG_XEN
                processor_device_array[pr->acpi_id] = device;
                if (pr->id != -1)
                        processors[pr->id] = pr;
        #else
                processor_device_array[pr->id] = device;
        
                processors[pr->id] = pr;
        #endif /* CONFIG_XEN */
        
This code is fairly recent in the linux-2.6.18-xen.hg tree and comes
from a combination of two changesets (one adds the feature, the other
unbreaks native build resulting in the ifdef'ery seen above):

http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/d62d60eaba6e
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/e39cf97647af

There are bunch of changes subsequent to these but
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/de7f94bd650b looks
pretty interesting:

        changeset:   713:de7f94bd650b
        user:        Keir Fraser <keir.fraser@citrix.com>
        date:        Tue Oct 28 10:39:11 2008 +0000
        files:       drivers/acpi/processor_core.c
        description:
        dom0: Fix for throttling while pr->id == -1
        
        Signed-off-by: Wei Gang <gang.wei@intel.com>

This changeset is not present in our current kernel tree. I have added
it and it will show up in the snapshot builds shortly.

Ian.

-- 
Ian Campbell

No passing.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: