[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#580889: linux-image-2.6.32-3-amd64: tasks get stuck at pvclock_clocksource_read under xen


here's even more info on the issue. This time I discovered a
workaround but don't still know the real root cause of the problem.

I looked at the blkfront_info structure at 0xffff8800029e8000. In

info->ring.rsp_cons = 96031
info->ring.sring->rsp_prod = 96047

=> The ringbuffer contains 16 entries that are done but have not been
handled by the frontend.

If I let i range from 96031 to 96047 and then look at

    ring.sring->ring[ i & 31].rsp.id

I see the shadow[] indexes of these entries:

14 2 15 3 13 9 1 11 5 16 4 10 12 0 6 7

If I then let j range over these indexes and look at

((struct request*)(((struct blkfront_info*)0xffff8800029e8000)->shadow[j].request))->__sector
((struct request*)(((struct blkfront_info*)0xffff8800029e8000)->shadow[j].request))->buffer

I get sector numbers and data buffer contents. If I then use

sudo dd if=/dev/xendisk/orbit_root bs=512 skip=$sector count=1 2> /dev/null |less

for each sector listed above I see that all the data in the buffers
matches what is already in disk.

=> All the data has been already written to the disk.

So, where's the interrupt to notify the frontend about this?

The following kernel module calls the interrupt handler directly. It
lets me recover from the stuck state.

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/irqreturn.h>

static irqreturn_t (*blkif_interrupt_copy)(int irq, void *dev_id) = (void*)0xffffffffa00007ba;

static int __init init_crash(void) {
    printk(KERN_EMERG "crash: calling blkif_interrupt\n");
    blkif_interrupt_copy(22, (void*)0xffff8800029e8000 /* struct blkfront_info* */);
    printk(KERN_EMERG "crash: blkif_interrupt returned\n");

    return 0;

static void __exit cleanup_crash(void) {

MODULE_DESCRIPTION("crash the system");
MODULE_AUTHOR("Timo Juhani Lindfors <timo.lindfors@iki.fi>");

Reply to: