Executive summary: please try the r12453 sid snapshot once it appears. apt repo lines are available on http://wiki.debian.org/DebianKernel In the later report (Paul Hedderly) the erroring rip (0xffffffff80227fe2) which corresponds to set_cpus_allowed_ptr+31 (or 0x1f out of 0xe0) is: 0xffffffff80227fda <set_cpus_allowed_ptr+23>: callq 0xffffffff8022493b <task_rq_lock> 0xffffffff80227fdf <set_cpus_allowed_ptr+28>: mov %rax,%r13 0xffffffff80227fe2 <set_cpus_allowed_ptr+31>: mov (%rbx),%rax 0xffffffff80227fe5 <set_cpus_allowed_ptr+34>: and $0xffffffffffffffff,%eax 0xffffffff80227fe8 <set_cpus_allowed_ptr+37>: test %rax,0x3e5819(%rip) # 0xffffffff8060d808 <cpu_online_map> The fault is on the address in %rbx (0xffffffffff5f7000). 0xffffffff80227fe2 is in set_cpus_allowed_ptr (kernel/sched.c:5628). 5623 unsigned long flags; 5624 struct rq *rq; 5625 int ret = 0; 5626 5627 rq = task_rq_lock(p, &flags); 5628 if (!cpus_intersects(*new_mask, cpu_online_map)) { 5629 ret = -EINVAL; 5630 goto out; 5631 } 5632 I believe %rbx is new_mask. In the earlier two reports (both Andrea Janna's) the erroring rip (0xffffffff80228045) doesn't precisely match this but says it set_cpus_allowed_ptr+0x1f/0xe0 which is the same as in Paul's report so I think it safe to say the versions we're simply linked slightly differently and it's the same instruction. The caller of set_cpus_allowed_ptr was ":processor:acpi_processor_get_throttling+0x45/0x6a" which is 0x00000000000004fa <acpi_processor_get_throttling+64>: callq 0x4ff <acpi_processor_get_throttling+69> 0x00000000000004ff <acpi_processor_get_throttling+69>: mov %rbx,%rdi (odd address since this is an unlinked .ko file) 0x4fa is in acpi_processor_get_throttling (drivers/acpi/processor_throttling.c:841). 836 return -ENODEV; 837 /* 838 * Migrate task to the cpu pointed by pr. 839 */ 840 saved_mask = current->cpus_allowed; 841 set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id)); 842 ret = pr->throttling.acpi_processor_get_throttling(pr); 843 /* restore the previous state */ 844 set_cpus_allowed_ptr(current, &saved_mask); 845 So this suggests that &cpumask_of_cpu(pr->id) is somehow bogus. pr came from acpi_processor_start() via acpi_processor_get_throttling_info(). Just before the call to acpi_processor_get_throttling_info() in acpi_processor_start() we see: #ifdef CONFIG_XEN BUG_ON(pr->acpi_id >= NR_ACPI_CPUS); if (processor_device_array[pr->acpi_id] != NULL && processor_device_array[pr->acpi_id] != device) { #else if (processor_device_array[pr->id] != NULL && processor_device_array[pr->id] != device) { #endif /* CONFIG_XEN */ printk(KERN_WARNING "BIOS reported wrong ACPI id " "for the processor\n"); return -ENODEV; } #ifdef CONFIG_XEN processor_device_array[pr->acpi_id] = device; if (pr->id != -1) processors[pr->id] = pr; #else processor_device_array[pr->id] = device; processors[pr->id] = pr; #endif /* CONFIG_XEN */ This code is fairly recent in the linux-2.6.18-xen.hg tree and comes from a combination of two changesets (one adds the feature, the other unbreaks native build resulting in the ifdef'ery seen above): http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/d62d60eaba6e http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/e39cf97647af There are bunch of changes subsequent to these but http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/de7f94bd650b looks pretty interesting: changeset: 713:de7f94bd650b user: Keir Fraser <keir.fraser@citrix.com> date: Tue Oct 28 10:39:11 2008 +0000 files: drivers/acpi/processor_core.c description: dom0: Fix for throttling while pr->id == -1 Signed-off-by: Wei Gang <gang.wei@intel.com> This changeset is not present in our current kernel tree. I have added it and it will show up in the snapshot builds shortly. Ian. -- Ian Campbell No passing.
Attachment:
signature.asc
Description: This is a digitally signed message part