Bug#580889: linux-image-2.6.32-3-amd64: tasks get stuck at pvclock_clocksource_read under xen

To: 580889@bugs.debian.org
Subject: Bug#580889: linux-image-2.6.32-3-amd64: tasks get stuck at pvclock_clocksource_read under xen
From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Fri, 08 Oct 2010 16:49:03 +0300
Message-id: <[🔎] 8439sghk1c.fsf_-_@sauna.l.org>
Reply-to: Timo Juhani Lindfors <timo.lindfors@iki.fi>, 580889@bugs.debian.org
In-reply-to: <handler.580889.B580889.128582944032542.ackinfo@bugs.debian.org> (Debian Bug Tracking System's message of "Thu\, 30 Sep 2010 06\:51\:24 +0000")
References: <8462xnrah1.fsf_-_@sauna.l.org> <handler.580889.B580889.128582944032542.ackinfo@bugs.debian.org>

Hi,

here's even more info on the issue. This time I discovered a
workaround but don't still know the real root cause of the problem.

I looked at the blkfront_info structure at 0xffff8800029e8000. In
particular

info->ring.rsp_cons = 96031
info->ring.sring->rsp_prod = 96047

=> The ringbuffer contains 16 entries that are done but have not been
handled by the frontend.

If I let i range from 96031 to 96047 and then look at

    ring.sring->ring[ i & 31].rsp.id

I see the shadow[] indexes of these entries:

14 2 15 3 13 9 1 11 5 16 4 10 12 0 6 7

If I then let j range over these indexes and look at

((struct request*)(((struct blkfront_info*)0xffff8800029e8000)->shadow[j].request))->__sector
((struct request*)(((struct blkfront_info*)0xffff8800029e8000)->shadow[j].request))->buffer

I get sector numbers and data buffer contents. If I then use

sudo dd if=/dev/xendisk/orbit_root bs=512 skip=$sector count=1 2> /dev/null |less

for each sector listed above I see that all the data in the buffers
matches what is already in disk.

=> All the data has been already written to the disk.

So, where's the interrupt to notify the frontend about this?

The following kernel module calls the interrupt handler directly. It
lets me recover from the stuck state.

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/irqreturn.h>

static irqreturn_t (*blkif_interrupt_copy)(int irq, void *dev_id) = (void*)0xffffffffa00007ba;

static int __init init_crash(void) {
    printk(KERN_EMERG "crash: calling blkif_interrupt\n");
    blkif_interrupt_copy(22, (void*)0xffff8800029e8000 /* struct blkfront_info* */);
    printk(KERN_EMERG "crash: blkif_interrupt returned\n");

    return 0;
}

static void __exit cleanup_crash(void) {
}

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("crash the system");
MODULE_AUTHOR("Timo Juhani Lindfors <timo.lindfors@iki.fi>");
module_init(init_crash);
module_exit(cleanup_crash);

Reply to:

Prev by Date: Bug#599471: linux-image-2.6.32-5-686-bigmem: microphone doesn't work
Next by Date: Bug#592497: linux-image-2.6.32-bpo.5-amd64: strange memory messages
Previous by thread: Bug#586995: KMS not working on ATI Technologies Inc M56P [Radeon Mobility X1600]
Next by thread: Bug#580889: linux-image-2.6.32-3-amd64: tasks get stuck at pvclock_clocksource_read under xen
Index(es):
- Date
- Thread