Bug#580889: linux-image-2.6.32-3-amd64: tasks get stuck at pvclock_clocksource_read under xen
- To: 580889@bugs.debian.org
- Subject: Bug#580889: linux-image-2.6.32-3-amd64: tasks get stuck at pvclock_clocksource_read under xen
- From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
- Date: Fri, 08 Oct 2010 16:49:03 +0300
- Message-id: <[🔎] 8439sghk1c.fsf_-_@sauna.l.org>
- Reply-to: Timo Juhani Lindfors <timo.lindfors@iki.fi>, 580889@bugs.debian.org
- In-reply-to: <handler.580889.B580889.128582944032542.ackinfo@bugs.debian.org> (Debian Bug Tracking System's message of "Thu\, 30 Sep 2010 06\:51\:24 +0000")
- References: <8462xnrah1.fsf_-_@sauna.l.org> <handler.580889.B580889.128582944032542.ackinfo@bugs.debian.org>
Hi,
here's even more info on the issue. This time I discovered a
workaround but don't still know the real root cause of the problem.
I looked at the blkfront_info structure at 0xffff8800029e8000. In
particular
info->ring.rsp_cons = 96031
info->ring.sring->rsp_prod = 96047
=> The ringbuffer contains 16 entries that are done but have not been
handled by the frontend.
If I let i range from 96031 to 96047 and then look at
ring.sring->ring[ i & 31].rsp.id
I see the shadow[] indexes of these entries:
14 2 15 3 13 9 1 11 5 16 4 10 12 0 6 7
If I then let j range over these indexes and look at
((struct request*)(((struct blkfront_info*)0xffff8800029e8000)->shadow[j].request))->__sector
((struct request*)(((struct blkfront_info*)0xffff8800029e8000)->shadow[j].request))->buffer
I get sector numbers and data buffer contents. If I then use
sudo dd if=/dev/xendisk/orbit_root bs=512 skip=$sector count=1 2> /dev/null |less
for each sector listed above I see that all the data in the buffers
matches what is already in disk.
=> All the data has been already written to the disk.
So, where's the interrupt to notify the frontend about this?
The following kernel module calls the interrupt handler directly. It
lets me recover from the stuck state.
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/irqreturn.h>
static irqreturn_t (*blkif_interrupt_copy)(int irq, void *dev_id) = (void*)0xffffffffa00007ba;
static int __init init_crash(void) {
printk(KERN_EMERG "crash: calling blkif_interrupt\n");
blkif_interrupt_copy(22, (void*)0xffff8800029e8000 /* struct blkfront_info* */);
printk(KERN_EMERG "crash: blkif_interrupt returned\n");
return 0;
}
static void __exit cleanup_crash(void) {
}
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("crash the system");
MODULE_AUTHOR("Timo Juhani Lindfors <timo.lindfors@iki.fi>");
module_init(init_crash);
module_exit(cleanup_crash);
Reply to: