[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#704767: linux-image-2.6.32-5-amd64: kernel crashes when task block for more than 120 seconds



On Mon, 2013-04-08 at 11:15 -0400, Olivier Diotte wrote:
> Hi Ben,
> 
> I just arrived at the machine this morning and it seems the
> system is responsive: gdm, my gnome-session,
> even gnome-screensaver were still running
>  (the other two times, I had been kicked back
>  to tty1 and the system was totally unresponsive).
> 
> On the other hand, the doxygen process
>  terminated (with a segfault though).
> The logs seem to indicate "page allocation failure"
>  crashes

A page allocation failure is not a crash.

>  and there are also errors that may
>  indicate my USB dongle is at fault (at some point
>  during the weekend, exim was unable to connect
>  to localhost and there are a lot of errors related
>  to wifi/networking).
> 
> It also seems weird to me that /var/log/debug
>  skips from April 6th 19h02 to April 8th 09h41

I don't think that's so weird.

> Crashes also seems to say my kernel is tainted
>  which I am not sure why as the closest thing
>  I have to a taint would be
>  firmware-realtek (for the USB dongle).

No, the wireless driver taints the kernel because it comes from the
staging area of the kernel source.  The staging drivers have not been
thoroughly reviewed and are assumed (usually correctly) to be quite
buggy.

> I am unsure what to try next except another
> doxygen run friday, except this time I will
>  deactivate/remove the USB dongle and reboot
>  beforehand. Let me know if you have
> a better idea for tests, would like other infos, etc.
> 
> Attached are all logs that seemed
>  relevant (as a .tar.bz2 archive), edited to remove my
>  MAC address (replaced with MY:MA:Ca:dr:es:s0,
>  yeah, I forgot a d, but I am too lazy
>  to edit now). I also removed all entries predating
>  April 5th at around 17h00.
> I also attached /proc/meminfo from this morning, in case that is relevant.

OK, there's nothing weird in meminfo.

I think the basic problem is that doxygen is allocating more memory than
can be provided on this computer.  If the working set (the set of data
that's regularly accessed) for all running programs adds up to more than
the size of physical memory then the kernel will be continuously
swapping data to and from the swap partition, and the larger programs
will become unresponsive.

The wireless networking failure is just a symptom of the shortage of
free memory.

Changing the I/O class, as you originally attempted, doesn't affect
swapping, so far as I know.

Are you running doxygen over a particularly large set of sources?  I ask
because I want to know whether this could be a bug in doxygen (use of
excessive memory).

Ben.

-- 
Ben Hutchings
The first rule of tautology club is the first rule of tautology club.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: