[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#579212: marked as done (firmware-bnx2: Kernel panic or hardware shutdown on HP ProLiant ML370 G6)



Your message dated Mon, 26 Apr 2010 15:13:27 +0100
with message-id <20100426141327.GL16821@decadent.org.uk>
and subject line Re: Bug#579212: firmware-bnx2: Kernel panic or hardware shutdown on HP ProLiant ML370  G6
has caused the Debian Bug report #579212,
regarding firmware-bnx2: Kernel panic or hardware shutdown on HP ProLiant ML370  G6
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
579212: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579212
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: firmware-bnx2
Version: 0.23
Severity: important
Tags: squeeze


When i bring up network interface eth1-eth3 i see messages from kernel
like this:

Message from syslogd@localhost at Apr 2 09:29:57 ... kernel:[
227.352979] Uhhuh. NMI received for unknown reason b1 on CPU 0.
Message from syslogd@localhost at Apr 2 09:29:57 ... kernel:[
227.353044] You have some hardware problem, likely on the PCI bus.
Message from syslogd@localhost at Apr 2 09:29:57 ... kernel:[
227.353106] Dazed and confused, but trying to continue

and my network card become unavaible. Sometimes I see kernel panic
message instead of this.

When I use only interface eth0 after some hours I see kernel message:

[121030.457885] eth0: Device temperature 100 degrees C exceeds maximum
allowed. Hardware has been shut down.
[121030.457939] eth1: Device temperature 100 degrees C exceeds maximum
allowed. Hardware has been shut down.
[121030.457990] eth2: Device temperature 100 degrees C exceeds maximum
allowed. Hardware has been shut down.
[121062.437873] netxen_nic: card response timeout.
[121062.437902] netxen_nic: Failed to destroy rx ctx in firmware
[121094.417869] netxen_nic: card response timeout.
[121094.417898] netxen_nic: Failed to destroy tx ctx in firmware
[121094.460703] eth3: Device temperature 100 degrees C exceeds maximum
allowed. Hardware has been shut down.

But I am sure that this temerature can't exist.
And after this error network card doesn't work too.

I can not tell exaclty what model of network card is used, but I can
say that server which is used is HP ProLiant ML370 G6.

$ lspci -vs 06:00.0
06:00.0 Ethernet controller: NetXen Incorporated NX3031 Multifunction
1/10-Gigabit Server Adapter (rev 42)
        Subsystem: Hewlett-Packard Company NC375i Integrated Quad Port
Multifunction Gigabit Server Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 24
        Memory at fae00000 (64-bit, non-prefetchable) [size=2M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        Capabilities: <access denied>
        Kernel driver in use: netxen_nic
        Kernel modules: netxen_nic

$ dmesg |grep eth
[    1.796876] ACPI Error (psparse-0537): Method parse/execution
failed [\_SB_._OSC] (Node ffff880426c14f20), AE_AML_BUFFER_LIMIT
[    2.432767] netxen_nic 0000:06:00.0: eth0: GbE port initialized
[    2.434177] netxen_nic 0000:06:00.1: eth1: GbE port initialized
[    2.435628] netxen_nic 0000:06:00.2: eth2: GbE port initialized
[    2.437115] netxen_nic 0000:06:00.3: eth3: GbE port initialized
[    8.754656] netxen_nic: eth0 NIC Link is up
[    8.755739] ADDRCONF(NETDEV_UP): eth0: link is not ready
[    8.760126] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   19.319590] eth0: no IPv6 routers present


-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-3-amd64 (SMP w/16 CPU cores)
Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

firmware-bnx2 depends on no packages.

firmware-bnx2 recommends no packages.

Versions of packages firmware-bnx2 suggests:
ii  initramfs-tools               0.93.4     tools for generating an initramfs
ii  linux-image-2.6.26-2-amd64 [l 2.6.26-21  Linux 2.6.26 image on AMD64
ii  linux-image-2.6.32-3-amd64 [l 2.6.32-9   Linux 2.6.32 for 64-bit PCs

-- no debconf information



--- End Message ---
--- Begin Message ---
On Mon, Apr 26, 2010 at 04:14:02PM +0700, Алексей Годицкий wrote:
> Package: firmware-bnx2

How can you blame Broadcom firmware for problems with a NetXen NIC?

> Version: 0.23
> Severity: important
> Tags: squeeze
> 
> 
> When i bring up network interface eth1-eth3 i see messages from kernel
> like this:
> 
> Message from syslogd@localhost at Apr 2 09:29:57 ... kernel:[
> 227.352979] Uhhuh. NMI received for unknown reason b1 on CPU 0.
> Message from syslogd@localhost at Apr 2 09:29:57 ... kernel:[
> 227.353044] You have some hardware problem, likely on the PCI bus.
> Message from syslogd@localhost at Apr 2 09:29:57 ... kernel:[
> 227.353106] Dazed and confused, but trying to continue

This indicates a hardware problem.

> and my network card become unavaible. Sometimes I see kernel panic
> message instead of this.
> 
> When I use only interface eth0 after some hours I see kernel message:
> 
> [121030.457885] eth0: Device temperature 100 degrees C exceeds maximum
> allowed. Hardware has been shut down.
> [121030.457939] eth1: Device temperature 100 degrees C exceeds maximum
> allowed. Hardware has been shut down.
> [121030.457990] eth2: Device temperature 100 degrees C exceeds maximum
> allowed. Hardware has been shut down.
> [121062.437873] netxen_nic: card response timeout.
> [121062.437902] netxen_nic: Failed to destroy rx ctx in firmware
> [121094.417869] netxen_nic: card response timeout.
> [121094.417898] netxen_nic: Failed to destroy tx ctx in firmware
> [121094.460703] eth3: Device temperature 100 degrees C exceeds maximum
> allowed. Hardware has been shut down.
> 
> But I am sure that this temerature can't exist.
[...]

I think that's an entirely plausible temperature.  You need to talk to
the hardware vendor (HP and/or NetXen).

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus


--- End Message ---

Reply to: