[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#598323: linux-image-2.6.35.6: Servers reboot on heavy load on DRBD+OCFS2 partition



On 30/09/10 04:49, Ben Hutchings wrote:
On Wed, 2010-09-29 at 18:17 +0400, Proskurin Kirill wrote:
On 29/09/10 01:08, Ben Hutchings wrote:
On Tue, 2010-09-28 at 09:47 +0100, Proskurin Kirill wrote:
Package: linux-image-2.6.35.6
Version: 2.6.35.6-10.00.Custom
Severity: important


Hello.

First of all - this it my first bugreport to debian and I sorry if I
do something wrong - just tell me what need to fix in it.

I have 2 servers Dell 2950 and try to use it as a email cluster.
I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load
every time.

I report bug for a package linux-image-2.6.35.6 but it is not true - I
have this problem on 2.6.26(stable) and 2.6.32(testing). I just try
latest kernel to be sure.
I try ocfs2-tools from stable and from testing - nodes reboot. I try
DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8
from sourse with 2.6.35-6 - nodes reboot.
So I think it is a kernel relaited but I can be really wrong. Im not
sure what couse this reboots.

Can you reproduce this in 2.6.35 or 2.6.36-rc5 (current version in
experimental) using the version of drbd that is included in it rather
than a separately built version?

Ok. I working on it. Have problem to get work bnx2 driver in 2.6.36-rc5

update-initramfs: Generating /boot/initrd.img-2.6.36-rc5
W: Possible missing firmware
/lib/firmware/bnx2/bnx2-rv2p-09ax-5.0.0.j10.fw for module bnx2
W: Possible missing firmware
/lib/firmware/bnx2/bnx2-rv2p-09-5.0.0.j10.fw for module bnx2
W: Possible missing firmware
/lib/firmware/bnx2/bnx2-mips-09-5.0.0.j15.fw for module bnx2
W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-5.0.0.j6.fw
for module bnx2

Oops.  I've added the new firmware here:
<http://svn.debian.org/wsvn/kernel/dists/trunk/firmware-nonfree/bnx2/bnx2/>

Lates firmware-bnx2 not helps. Build from source fail with many errors.
In 2.6.35 it is seems to work ok. 2.6.36 check is mandatory?

No, it's OK to test 2.6.35.

Ben.


Something strange here:
http://packages.debian.org/experimental/linux-source-2.6.35

Links goes to http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5.orig.tar.gz

36, not 35.

Any way - your firmware helps and I go with 2.6.36-rc5

# cd /usr/srs
# wget http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5.orig.tar.gz # wget http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5-1~experimental.1.dsc # wget http://ftp.de.debian.org/debian/pool/main/l/linux-2.6/linux-2.6_2.6.36~rc5-1~experimental.1.diff.gz
# tar xf linux-2.6_2.6.36~rc5.orig.tar.gz
# gzip -dc linux-2.6_2.6.36~rc5-1~experimental.1.diff.gz > linux-2.6_2.6.36~rc5-1~experimental.1.diff
# cd linux-2.6-2.6.36~rc5
# patch -p1 < ../linux-2.6_2.6.36~rc5-1~experimental.1.diff
# cp /boot/config-2.6.32-5-amd64 config-2.6.32-5-amd64.config
# make-kpkg --rootcmd fakeroot  --initrd --us --uc kernel_image

*answer all question by default*

dpkg -i ../linux-image-2.6.36-rc5_2.6.36-rc5-10.00.Custom_amd64.deb

reboot

DRBD recommends use 8.3.8 with 2.6.35+ so I will build it from experemental.

wget, patch, build with:
dpkg-buildpackage -us -uc -sa -rfakeroot

dpkg -i drbd8-utils_8.3.8.1-1_amd64.deb

and install maintainers global_common.conf on both nodes but add:

net {
                allow-two-primaries;

on both to make it usable with OCFS2. And:

syncer {
                rate 30M;

To make sync fast - nodes connected via 1Gbit\s.
(DRBD recommends to make this attribute brandwith/3)

So I get:
# drbd-overview
0:drbd0  Connected Primary/Primary UpToDate/UpToDate C r----

Summary:

Kernel: 2.6.36-rc5 SMP x86_64 (from experimental)
DRBD-utils-8.3.8(from experimental)
OCFS2-1.4.4-3(from testing)
iozone3-308-1(from testing)

While update(aptitude safe-upgrade) first node I get kernel panic. Screenshot in attachment.

Reboot.

I mount OCFS2 partition and... get another hang. See it in attachment.

Hm, seems to it is not stable enough for test but I will try one more time.

NB: At most times during previous test and not I see panic on first node - second just reboots.

reboot.

Now I able to mount OCFS2 and start iozone test.
It runs for few hours and seems to will end good I will tell how it ends tomorrow.


--
Best regards,
Proskurin Kirill

Attachment: 36-bug.png
Description: PNG image

Attachment: 36-bug2.png
Description: PNG image


Reply to: