[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#517586: Found in 2.6.26-2-amd64



Just to chip in with my experience of this bug. I have two physically
identical machines (HP DL160 G5s) each running a few kvm instances. One
of them experiences this bug repeatedly (daily, usually), the other one
never. It's really annoying me.

Linux version 2.6.26-2-amd64 (Debian 2.6.26-19) (dannf@debian.org) (gcc
version
4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Wed Aug 19
22:33:18 UTC 20
09

INFO: task kvm:4559 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kvm           D 0000000000000082     0  4559      1
 ffff8101c6c9bc58 0000000000000082 ffff81021c002210 ffffffff8031b528
 ffff81021c8ef7b0 ffff81021d1f0ad0 ffff81021c8efa38 00000003a0083c0d
 0000000000000086 ffffffff802125eb 00003042a87d1d5c ffffffff8024acb6
Call Trace:
 [<ffffffff8031b528>] kobject_get+0x12/0x17
 [<ffffffff802125eb>] read_tsc+0x9/0x20
 [<ffffffff8024acb6>] getnstimeofday+0x39/0x98
 [<ffffffff80272934>] sync_page_killable+0x0/0x31
 [<ffffffff80429087>] io_schedule+0x5c/0x9e
 [<ffffffff80271116>] sync_page+0x3c/0x41
 [<ffffffff8027293d>] sync_page_killable+0x9/0x31
 [<ffffffff804291fa>] __wait_on_bit_lock+0x36/0x66
 [<ffffffff80271063>] __lock_page_killable+0x5e/0x64
 [<ffffffff8024624f>] wake_bit_function+0x0/0x23
 [<ffffffff80272c5f>] generic_file_aio_read+0x2fa/0x4ae
 [<ffffffff8029ae47>] do_sync_read+0xc9/0x10c
 [<ffffffff80246221>] autoremove_wake_function+0x0/0x2e
 [<ffffffff802125eb>] read_tsc+0x9/0x20
 [<ffffffff80248dc2>] ktime_get_ts+0x22/0x4b
 [<ffffffff8029b638>] vfs_read+0xaa/0x152
 [<ffffffff8029bb00>] sys_pread64+0x50/0x70
 [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

I can supply a load more stack traces if you like, not all of them from
kvm. I notice that the one thing they have in common is that
read_tsc+0x9 is on the stack, sometimes at the top and sometimes (as in
this example) appearing twice. This seems implausible to me from reading
the code
(http://lxr.linux.no/#linux+v2.6.26/arch/x86/kernel/tsc_64.c#L306) so
the only options I can think of is that either pv_cpu_ops has become
corrupt or the code that generates stack traces has made a mistake. Are
either of these options likely?

I really want to get this fixed because a machine that crashes daily is
useless. So far I've tried flashing the BIOS on the machine and altering
the loading pattern (e.g. number of CPUs allocated to each VM), but I
haven't yet tried any of the kernel patches mentioned or installing a
new one from backports. Are they worth a go?

Richard.

- --------------------------------------------------------------------------------
IMPORTANT   The information contained in this e-mail and any
attachments is intended only for the named recipient and may be
privileged or confidential.

If you are not the intended recipient, please notify us immediately 
on +44 (0)1908 425000 and do not disclose, copy, distribute 
or take any action based on the contents of this e-mail. 

You should understand and accept that, when communicating with us
by e-mail, it is not a totally secure communication medium.

We accept no liability for any direct, indirect or consequential loss
arising from any action taken in reliance on the information contained
in this e-mail and give no warranty or representation as to its accuracy
or reliability.

DIGITALK has the facility to monitor and read both incoming
and outgoing communications by e-mail.  In line with industry efforts
to reduce the proliferation of un-solicited SPAM, DIGITALK uses 
methods and ban-lists to prevent malicious content reaching our users.

This message and any attachments has been scanned for known
viruses. However, we would advise you to ensure the content is
indeed virus free.  We do not, to the extent permitted by law, accept
any liability (whether in contract, negligence or otherwise) for any virus
infection and/or external compromise of security and/or breach of
confidentiality in relation to transmissions sent by e-mail.

VAT No: GB 876 3287 81. Reg No: 3080801
Place of Registration: England
Registered Office Address: 2 Radian Court, Knowlhill, Milton Keynes
- --------------------------------------------------------------------------------



Reply to: