[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition



On Fri, Oct 22, 2010 at 10:32:24AM +0400, Proskurin Kirill wrote:
> Hello!
>
> Sorry for such big delay - I was ill and then on vacation.
> I you still have an interest in this problem - I have new results.
>
> On 01/10/10 06:49, Ben Hutchings wrote:
>> On Thu, 2010-09-30 at 19:10 +0400, Proskurin Kirill wrote:
>> [...]
>>> Summary:
>>>
>>> Kernel: 2.6.36-rc5 SMP x86_64 (from experimental)
>>> DRBD-utils-8.3.8(from experimental)
>>> OCFS2-1.4.4-3(from testing)
>>
>> ocfs2 is already included in the kernel package and you should use that.
> OCFS2-1.4.4-3(from testing) - it is a userspace utility like mkfs.ocfs2.  
> Of course I use driver from kernel.

OK, good.

>>> While update(aptitude safe-upgrade) first node I get kernel panic.
>>> Screenshot in attachment.
>> [...]
>>
>> This panic shows "Tainted: G D" which means there was a previous "oops"
>> message.  You need to record the first one.
> Well I not got it twice.
>
> I can confirm what on configuration above(all testing + kernel  
> 2.6.36-rc5) I don`t got a reboot. iozone complete successfully without  
> any problems so yes - it is a kernel relaited problem. I retest it on  
> latest 2.6.32 from testing - and got reboot.
>
> So... what should I do now?

I'm sorry but I don't have any idea where the problem is.  So far as I
can see, there are no bug fixes to drbd or ocfs2 in 2.6.36-rc5 that are
not also in 2.6.35.6.  Maybe the bug is elsewhere and just triggered by
this combination of storage driver and filesystem.  Or, given that you
said that even 2.6.36-rc5 did crash once, it could be that the hardware
is unreliable.

So there are two things you could try, but I am not very hopeful:
1. Run a RAM test such as memtest86+.
2. Use 'git bisect' to find the change that makes the difference.
   Normally you would use this to find when a bug was introduced, but
   you can also use it to find when a bug was fixed if you reverse the
   'good' and 'bad' labels.
   See <http://book.git-scm.com/5_finding_issues_-_git_bisect.html>.

Ben.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

Attachment: signature.asc
Description: Digital signature


Reply to: