Executive summary: please try the r12453 sid snapshot once it appears.
apt repo lines are available on http://wiki.debian.org/DebianKernel
In the later report (Paul Hedderly) the erroring rip
(0xffffffff80227fe2) which corresponds to set_cpus_allowed_ptr+31 (or
0x1f out of 0xe0) is:
0xffffffff80227fda <set_cpus_allowed_ptr+23>: callq 0xffffffff8022493b <task_rq_lock>
0xffffffff80227fdf <set_cpus_allowed_ptr+28>: mov %rax,%r13
0xffffffff80227fe2 <set_cpus_allowed_ptr+31>: mov (%rbx),%rax
0xffffffff80227fe5 <set_cpus_allowed_ptr+34>: and $0xffffffffffffffff,%eax
0xffffffff80227fe8 <set_cpus_allowed_ptr+37>: test %rax,0x3e5819(%rip) # 0xffffffff8060d808 <cpu_online_map>
The fault is on the address in %rbx (0xffffffffff5f7000).
0xffffffff80227fe2 is in set_cpus_allowed_ptr (kernel/sched.c:5628).
5623 unsigned long flags;
5624 struct rq *rq;
5625 int ret = 0;
5626
5627 rq = task_rq_lock(p, &flags);
5628 if (!cpus_intersects(*new_mask, cpu_online_map)) {
5629 ret = -EINVAL;
5630 goto out;
5631 }
5632
I believe %rbx is new_mask. In the earlier two reports (both Andrea
Janna's) the erroring rip (0xffffffff80228045) doesn't precisely match
this but says it set_cpus_allowed_ptr+0x1f/0xe0 which is the same as in
Paul's report so I think it safe to say the versions we're simply linked
slightly differently and it's the same instruction.
The caller of set_cpus_allowed_ptr was
":processor:acpi_processor_get_throttling+0x45/0x6a" which is
0x00000000000004fa <acpi_processor_get_throttling+64>: callq 0x4ff <acpi_processor_get_throttling+69>
0x00000000000004ff <acpi_processor_get_throttling+69>: mov %rbx,%rdi
(odd address since this is an unlinked .ko file)
0x4fa is in acpi_processor_get_throttling (drivers/acpi/processor_throttling.c:841).
836 return -ENODEV;
837 /*
838 * Migrate task to the cpu pointed by pr.
839 */
840 saved_mask = current->cpus_allowed;
841 set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
842 ret = pr->throttling.acpi_processor_get_throttling(pr);
843 /* restore the previous state */
844 set_cpus_allowed_ptr(current, &saved_mask);
845
So this suggests that &cpumask_of_cpu(pr->id) is somehow bogus.
pr came from acpi_processor_start() via
acpi_processor_get_throttling_info(). Just before the call to
acpi_processor_get_throttling_info() in acpi_processor_start() we see:
#ifdef CONFIG_XEN
BUG_ON(pr->acpi_id >= NR_ACPI_CPUS);
if (processor_device_array[pr->acpi_id] != NULL &&
processor_device_array[pr->acpi_id] != device) {
#else
if (processor_device_array[pr->id] != NULL &&
processor_device_array[pr->id] != device) {
#endif /* CONFIG_XEN */
printk(KERN_WARNING "BIOS reported wrong ACPI id "
"for the processor\n");
return -ENODEV;
}
#ifdef CONFIG_XEN
processor_device_array[pr->acpi_id] = device;
if (pr->id != -1)
processors[pr->id] = pr;
#else
processor_device_array[pr->id] = device;
processors[pr->id] = pr;
#endif /* CONFIG_XEN */
This code is fairly recent in the linux-2.6.18-xen.hg tree and comes
from a combination of two changesets (one adds the feature, the other
unbreaks native build resulting in the ifdef'ery seen above):
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/d62d60eaba6e
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/e39cf97647af
There are bunch of changes subsequent to these but
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/de7f94bd650b looks
pretty interesting:
changeset: 713:de7f94bd650b
user: Keir Fraser <keir.fraser@citrix.com>
date: Tue Oct 28 10:39:11 2008 +0000
files: drivers/acpi/processor_core.c
description:
dom0: Fix for throttling while pr->id == -1
Signed-off-by: Wei Gang <gang.wei@intel.com>
This changeset is not present in our current kernel tree. I have added
it and it will show up in the snapshot builds shortly.
Ian.
--
Ian Campbell
No passing.
Attachment:
signature.asc
Description: This is a digitally signed message part