[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#667434: lvcreate / lvremove snapshot under Xen causes Kernel OOPs



Hi Ian,

On 05/04/12 01:00, Ian Campbell wrote:
Hi Quintin,

Thanks for your report.

On Wed, 2012-04-04 at 13:54 +1200, Quintin Russ wrote:
Package: linux-image-2.6.32-5-xen-amd64
Version: 2.6.32-39
Severity: important

We have observed an issue when a Xen dom0 is removing a snapshot for a
logical volume and another process comes along to create a snapshot
for that same device (different names) causing the server to Kernel
Ooops. According to my logs sometimes removing of the snapshot can
pause or take a while contributing to the issue. Attempts to add
locking code (using dotlockfile) have not so far been successful in
mitigating this bug, but we are still exploring this option.

The nodes that are affected intermittently&   we have been unable to
reproduce this issue in the lab (on either the same model of hardware
or hardware that has crashed in production). From our logs we can see
that every time this issue occurs one process has been removing the
snapshot while another has been creating a snapshot shortly after
(seconds normally). We are currently seeing about a 5% chance of a
crash per month (assuming our nodes are equal).

This bug looks similar to a number of bugs that have already been
filed related to this
issue:http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=614400  A quick
Google search shows many more (which have mostly been merged):
https://www.google.co.nz/webhp?q=site%3Abugs.debian.org%20xen%
20snapshot%20kernel%20oops%20squeeze
Those issues were believed to be fixed in 2.6.32-34 and you are running
2.6.32-39 so either this is a different issue (perhaps with similar
symptoms) or the issue isn't really fixed. Either way I think we need to
see your kernel logs containing the actual oops in order to make any
progress.

Yes, we have been having this problem since before 2.6.32-34 and were very hopeful that change would fix it. This sadly was not the case. Unfortunately there isn't anything in the logs for this, but I have a screenshot from the console, which I have attached.

I also had an idle shell at the time the server crashed and this is what I saw:

Message from syslogd@dom0 at Apr  4 01:37:22 ...
 kernel:[4805213.000629] Oops: 0000 [#1] SMP

Message from syslogd@dom0 at Apr  4 01:37:22 ...
kernel:[4805213.000661] last sysfs file: /sys/devices/virtual/block/dm-49/removable

Message from syslogd@dom0 at Apr  4 01:37:22 ...
 kernel:[4805213.001891] Stack:

Message from syslogd@dom0 at Apr  4 01:37:22 ...
 kernel:[4805213.002101] Call Trace:

Message from syslogd@dom0 at Apr  4 01:37:22 ...
kernel:[4805213.002540] Code: 66 ff 05 c9 83 58 00 48 89 ef e8 db 7a f7 ff 48 89 df e8 7f fe ff ff e8 51 b0 21 00 48 c7 c7 e0 99 67 81 e8 3b c0 21 00 48 8b 1b <48> 8b 03 48 81 fb 90 d1 48 81 0f 18 08 0f 85 64 ff ff ff 66 ff

Message from syslogd@dom0 at Apr  4 01:37:22 ...
 kernel:[4805213.002901] CR2: 0000000000000000

Please let me know if there is anything further I can provide.

Attachment: kerneloops.png
Description: PNG image


Reply to: