[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#637190: Random kernel panics & general protection faults




Jonathan Nieder <jrnieder@gmail.com> wrote:

>Ben Hutchings wrote:
>
>> So, new theory required.
>>
>> Given you said you're not using ECC memory, can you test it with
>> memtest86+ for a few hours?
>
>I assume you tried this?

I ran memtest86+ several days with all RAM modules on the board. It shown only one error. I decided to run the same test for each module individually. Again, couple of days for each. No error at all.

Then I (desesperatly) started to look at BIOS settings an found on the web some references to instability issues related to the AMD ganged/unganged mode.

In my case I switched from unganged to ganged and got no more issues since then (several month of uptime now on a system that freezed after less a day being up).



>
>I would also (selfishly) be interested in whether the kernel from sid
>behaves any differently.  The only packages from outside squeeze one
>would need in order to test are the kernel image itself,
>initramfs-tools, and linux-base.  If it is reproducible with a 3.1.y
>kernel, we can try pursuing this upstream, and if not, we can try to
>look for the patch that fixed it.

I definitively hear and understand your concern but the box is now in production and I can't afford any testing window no more (plus it's a big storage system on which my company is heavily dependent).

I'm not sure if the issue is actually solely hardware related or if there's something kernel related with that memory management mode on AMD platforms but I tend to think it's the first case.

If you want more specific details, let me know.

>
>Sincerely,
>Jonathan

-- 
Simon



Reply to: