[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#691665: [squeeze] Under CentOS 6.3 as a host OS Debian kernel hangs as a guest OS (KVM-QEMU)



found 691665 linux-2.6/3.3.4-1~experimental.1
fixed 691665 linux-2.6/3.4.1-1~experimental.1
quit

Zoltan Frombach wrote:

> I still experience the same problem with all 3.3.x kernels ( up to
> linux-image-3.3.0-trunk-amd64 amd64 3.3.4-1~experimental.1 )
>
> The problem is first resolved in this kernel: linux-image-3.4-trunk-amd64
> amd64 3.4.1-1~experimental.1

That was fast.

| $ git log --oneline --no-merges --grep=KVM v3.3.4..v3.4.1 -- arch/x86 | wc -l
| 55

Please test the attached patch against a 3.2.y-based kernel, for example
using the following instructions:

 0. prerequisites
	apt-get install git build-essential

 1. get the kernel history, if you don't already have it
	git clone \
	  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

 2. fetch point releases
	cd linux
	git remote add stable \
	  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
	git fetch stable

 3. configure, build, test
	git checkout stable/linux-3.2.y
	cp /boot/config-$(uname -r) .config; # current configuration
	scripts/config --disable DEBUG_INFO
	make localmodconfig; # optional: minimize configuration
	make deb-pkg; # optionally with -j<num> for parallel build
	dpkg -i ../<name of package>; # as root
	reboot

    Hopefully it reproduces the bug, so

 4. try the patch
	cd linux
	git am -3sc /path/to/the/patch
	make deb-pkg; # maybe with -j4
	dpkg -i ../<name of package>; # as root
	reboot

I don't expect this patch to work, but it's a place to start.

If you find yourself with nothing to do before someone looks more
carefully at the range with the fix, here's how to bisect to narrow it
further.  This might be a little confusing: here, a "good" kernel is
one exhibiting the bug and a "bad" one is a fixed one, so the "first
bad commit" is the fix.

  4. tell git about experiments so far
	git bisect start -- '*kvm*'
	git bisect bad v3.4.1; # boots ok
	git bisect good v3.3.4; # hangs

     git checks out a revision halfway between to test, so

  5. test
	make deb-pkg; # maybe with -j4
	dpkg -i ../<name of package>; # as root
	reboot

	git bisect bad; # if it boots ok
	git bisect good; # if it hangs in a similar way
	git bisect skip; # if some other bug makes it hard to test

  6. repeat step 5 until bored or until it shows the "first bad commit"

  7. at any step, if the gitk package is installed you can run "git
     bisect visualize" to watch the range with the fix narrowing,
     or "git bisect log" to summarize the tests you've already done
     and allow someone else to pick up where you left off

Sometimes this process is called a "reverse bisect", since we are
trying to find the patch that introduces a fix instead of a
regression.  If curious, you can read more about the more ordinary
kind of "git bisect" at [1].

Hope that helps,
Jonathan

[1] http://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html
see also http://kernel-handbook.alioth.debian.org/ch-bugs.html#s9.2.1
From: Gleb Natapov <gleb@redhat.com>
Date: Wed, 2 May 2012 15:04:02 +0300
Subject: KVM: Do not take reference to mm during async #PF

commit 62c49cc976af84cb0ffcb5ec07ee88da1a94e222 upstream.

It turned to be totally unneeded. The reason the code was introduced is
so that KVM can prefault swapped in page, but prefault can fail even
if mm is pinned since page table can change anyway. KVM handles this
situation correctly though and does not inject spurious page faults.

Fixes:
 "INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected" warning while
 running LTP inside a KVM guest using the recent -next kernel.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 arch/x86/kernel/kvm.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116001d6..a516c9bbc7e0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -91,7 +91,6 @@ struct kvm_task_sleep_node {
 	u32 token;
 	int cpu;
 	bool halted;
-	struct mm_struct *mm;
 };
 
 static struct kvm_task_sleep_head {
@@ -138,9 +137,7 @@ void kvm_async_pf_task_wait(u32 token)
 
 	n.token = token;
 	n.cpu = smp_processor_id();
-	n.mm = current->active_mm;
 	n.halted = idle || preempt_count() > 1;
-	atomic_inc(&n.mm->mm_count);
 	init_waitqueue_head(&n.wq);
 	hlist_add_head(&n.link, &b->list);
 	spin_unlock(&b->lock);
@@ -173,9 +170,6 @@ EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
 static void apf_task_wake_one(struct kvm_task_sleep_node *n)
 {
 	hlist_del_init(&n->link);
-	if (!n->mm)
-		return;
-	mmdrop(n->mm);
 	if (n->halted)
 		smp_send_reschedule(n->cpu);
 	else if (waitqueue_active(&n->wq))
@@ -219,7 +213,7 @@ again:
 		 * async PF was not yet handled.
 		 * Add dummy entry for the token.
 		 */
-		n = kmalloc(sizeof(*n), GFP_ATOMIC);
+		n = kzalloc(sizeof(*n), GFP_ATOMIC);
 		if (!n) {
 			/*
 			 * Allocation failed! Busy wait while other cpu
@@ -231,7 +225,6 @@ again:
 		}
 		n->token = token;
 		n->cpu = smp_processor_id();
-		n->mm = NULL;
 		init_waitqueue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
 	} else
-- 
1.8.0


Reply to: