[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#638631: Crash following repeated 'page allocation failure' & 'BUG: soft lockup'



On 20/08/2011 15:13, Ben Hutchings wrote:
On Sat, 2011-08-20 at 12:44 +0100, Richard Kettlewell wrote:
The emulated screen was blank and
did not respond to any input.  The kernel log is full of messages as
shown below.  They almost all name the same executable, apache_accesses
(from package munin-node) and have the same backtraces, but are
different PIDs (it is not a long-running process).

The exception is a "BUG: soft lockup - CPU#0 stuck for 35s!" message,
also below.

The guest is a news server, with around a kilobyte/second of network
traffic going into it at all times.

The guest has 512MB RAM assigned.  It has a gigabyte of swap available
and doesn't seem to have been using much of it.
[...]

That does sound wrong.

Could it be that the host is slow to service the guest's disk I/O?

In summary, it does seem to be.

Average latency on /dev/vda over the last week is reported as around 200ms. There are occasional spikes to multiple whole seconds.

A couple of those spikes are approximately the right time for 'BUG: soft lockup' messages (but the resolution is not very high) & there are other spikes with no corresponding messages in the kernel logs.

That said there are similar huge latency spikes visible for the host too. I think I need some faster storage l-(

As for throughput, a quick test with dd reveals the guest has about 10% of the write performance of the host, which seems pretty poor to me.

ttfn/rjk



Reply to: