[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#789770: linux-image-3.16.0-4-amd64: Dell R310 server (Xen 4.4 Dom0) periodically crashing after upgrade to Jessie from Wheezy



Hi,

I experience the same behaviour on 8 servers but with different hardware :
* NEC Express5800/R120a-2 [N8100-1501F]
* 8 or 16G of Ram ECC REG
* Processor: Intel Xeon E5503 or E5504. Single or dual processor.
* Network controller: Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
* 3ware Inc 9650SE SATA-II RAID PCIe --or-- software raid (md)
* FS: ext3
* Disks: Mostly WD, Caviar blue, black or VelociRaptor series.
* Virtual machine's disks are stored in LVMs.

Upgraded from :
* lenny -> squeeze -> wheezy -> jessie
    OR
* squeeze -> wheezy -> jessie

Upgrades where done in a single shot. Xen packages where not cleaned up before starting the virtual machines.

Cleaning-up old xen-s does not seems to help.

Package firmware-linux-free is installed on all servers.

I reinstalled a server in wheezy from scratch on one server, and jessie on another to see if it helps. VMs configurations and disks (lvm) are kept.

We also have DELLs (R420 and R620) upgraded from Wheezy to Jessie, but doesn't seems to be affected.


The error captured using netconsole (scsi error does not seems relevant) :
[60083.367483] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85. [60083.368945] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85. [60083.409478] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[62299.128350] ------------[ cut here ]------------
[62299.128760] WARNING: CPU: 0 PID: 0 at /build/linux-QZaPpC/linux-3.16.7-ckt11/net/sched/sch_generic.c:264 dev_watchdog+0x236/0x240
()
[62299.128810] NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
[62299.128839] Modules linked in: netconsole xt_tcpudp xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntd ev binfmt_misc xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge 8021q garp stp mrp llc bonding psmouse ttm drm_kms_helper drm coretemp evdev pcspkr serio_raw i2c_i801 ipmi_si lpc_ich mfd_core ipmi_msghandler tpm_ti s tpm button ioatdma processor i7core_edac edac_core thermal_sys shpchp configfs loop autofs4 ext4 crc16 mbcache jbd2 dm_mod sg sd_m od crc_t10dif crct10dif_generic crct10dif_common hid_generic usbhid hid crc32c_intel ahci uhci_hcd ehci_pci libahci ehci_hcd libata igb usbcore i2c_algo_bit 3w_9xxx usb_common i2c_core dca ptp scsi_mod pps_core [last unloaded: netconsole] [62299.129637] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1 [62299.129703] Hardware name: NEC NEC Express5800/R120a-2 [N8100-1501F]/MS-9197-01S, BIOS 1.0.1C10 03/24/2009 [62299.129799] 0000000000000009 ffffffff8150b405 ffff88001f203e28 ffffffff81067797 [62299.129906] 0000000000000002 ffff88001f203e78 0000000000000008 0000000000000000 [62299.130014] ffff88000226e000 ffffffff810677fc ffffffff81777fb8 0000000000000030
[62299.130121] Call Trace:
[62299.130166]  <IRQ>  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[62299.130244]  [<ffffffff81067797>] ? warn_slowpath_common+0x77/0x90
[62299.130302]  [<ffffffff810677fc>] ? warn_slowpath_fmt+0x4c/0x50
[62299.130364]  [<ffffffff8100a0c1>] ? xen_timer_interrupt+0x111/0x150
[62299.130425]  [<ffffffff8143eb96>] ? dev_watchdog+0x236/0x240
[62299.130482]  [<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
[62299.130541]  [<ffffffff81072ae1>] ? call_timer_fn+0x31/0x100
[62299.130597]  [<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
[62299.130654]  [<ffffffff81074119>] ? run_timer_softirq+0x209/0x2f0
[62299.130712]  [<ffffffff8106c641>] ? __do_softirq+0xf1/0x290
[62299.130768]  [<ffffffff8106ca15>] ? irq_exit+0x95/0xa0
[62299.130901]  [<ffffffff81358495>] ? xen_evtchn_do_upcall+0x35/0x50
[62299.130964]  [<ffffffff8151325e>] ? xen_do_hypervisor_callback+0x1e/0x30
[62299.131022] <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[62299.131094]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[62299.131153]  [<ffffffff81009e0c>] ? xen_safe_halt+0xc/0x20
[62299.131211]  [<ffffffff8101c999>] ? default_idle+0x19/0xb0
[62299.131270]  [<ffffffff810a7ff0>] ? cpu_startup_entry+0x340/0x400
[62299.131329]  [<ffffffff81903071>] ? start_kernel+0x492/0x49d
[62299.131385]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[62299.131442]  [<ffffffff81904f64>] ? xen_start_kernel+0x569/0x573
[62299.131498] ---[ end trace 93cc57d7dca442f8 ]---
[62299.133025] igb 0000:01:00.0 eth0: Reset adapter
[62299.176937] bonding: bond0: making interface eth1 the new active one
[62299.181054] device eth0 left promiscuous mode
[62299.181320] device eth1 entered promiscuous mode
[62302.120815] igb 0000:01:00.0 eth0: Reset adapter
[62305.261445] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [62305.280904] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex
[62310.121470] igb 0000:01:00.1 eth1: Reset adapter
[62310.185794] bonding: bond0: making interface eth0 the new active one
[62310.189887] device eth1 left promiscuous mode
[62310.190138] device eth0 entered promiscuous mode
[62313.121750] igb 0000:01:00.1 eth1: Reset adapter
[62316.342434] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [62316.389986] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex
[62340.124133] igb 0000:01:00.0 eth0: Reset adapter
[62340.124295] igb 0000:01:00.0: Detected Tx Unit Hang
[62340.124295]   Tx Queue             <2>
[62340.124295]   TDH                  <eb>
[62340.124295]   TDT                  <eb>
[62340.124295]   next_to_use          <eb>
[62340.124295]   next_to_clean        <e2>
[62340.124295] buffer_info[next_to_clean]
[62340.124295]   time_stamp           <100ec9490>
[62340.124295]   next_to_watch        <ffff8800108c7e20>
[62340.124295]   jiffies              <100ec9e0c>
[62340.124295]   desc.status          <d8001>
[62340.201532] bonding: bond0: making interface eth1 the new active one
[62340.205626] device eth0 left promiscuous mode
[62340.205875] device eth1 entered promiscuous mode
[62343.124467] igb 0000:01:00.0 eth0: Reset adapter
[62346.245186] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [62346.304754] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex
[62356.125561] igb 0000:01:00.1 eth1: Reset adapter
[62356.210009] bonding: bond0: making interface eth0 the new active one
[62356.214118] device eth1 left promiscuous mode
[62356.214361] device eth0 entered promiscuous mode
[62359.125973] igb 0000:01:00.1 eth1: Reset adapter
[62360.117969] sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting card. [62362.414672] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex [62364.122257] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[62381.087874] igb 0000:01:00.0 eth0: Reset adapter
[62381.220302] bonding: bond0: making interface eth1 the new active one
[62381.224405] device eth0 left promiscuous mode
[62381.224645] device eth1 entered promiscuous mode
[62384.080265] igb 0000:01:00.0 eth0: Reset adapter
[62387.324875] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex [62389.076598] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [62397.861349] sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting card.
[62414.130842] igb 0000:01:00.1 eth1: Reset adapter
[62414.131017] igb 0000:01:00.1: Detected Tx Unit Hang
[62414.131017]   Tx Queue             <2>
[62414.131017]   TDH                  <ec>
[62414.131017]   TDT                  <ec>
[62414.131017]   next_to_use          <ec>
[62414.131017]   next_to_clean        <d8>
[62414.131017] buffer_info[next_to_clean]
[62414.131017]   time_stamp           <100ecdd8c>
[62414.131017]   next_to_watch        <ffff8800108d9da0>
[62414.131017]   jiffies              <100ece650>
[62414.131017]   desc.status          <1>
[62414.235238] bonding: bond0: making interface eth0 the new active one
[62414.239283] device eth1 left promiscuous mode
[62414.239526] device eth0 entered promiscuous mode
[62417.131239] igb 0000:01:00.1 eth1: Reset adapter
[62420.439830] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex [62422.127613] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[62425.491896] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.491980] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.492039] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.492096] sd 0:0:0:0: Device offlined - not ready after error recovery
[62425.495879] sd 0:0:0:0: rejecting I/O to offline device
[62425.495946] sd 0:0:0:0: [sda] killing request
[62425.496004] sd 0:0:0:0: rejecting I/O to offline device
[62425.496057] sd 0:0:0:0: [sda] killing request
[62425.496107] sd 0:0:0:0: rejecting I/O to offline device
[62425.496159] sd 0:0:0:0: [sda] killing request
[62425.496209] sd 0:0:0:0: rejecting I/O to offline device
[62425.496261] sd 0:0:0:0: [sda] killing request
[62425.496315] sd 0:0:0:0: rejecting I/O to offline device
[62425.496429] sd 0:0:0:0: [sda] killing request
[62425.496482] sd 0:0:0:0: rejecting I/O to offline device
[62425.496486] sd 0:0:0:0: [sda] Unhandled error code
[62425.496488] sd 0:0:0:0: [sda]
[62425.496490] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[62425.496491] sd 0:0:0:0: [sda] CDB:
[62425.496495] Write(10): 2a 00 2b ad d0 60 00 00 08 00
[62425.496497] end_request: I/O error, dev sda, sector 732811360
[62425.496831] sd 0:0:0:0: [sda] killing request

[...]

Loaded modules :
Module                  Size  Used by
netconsole             13318  0
configfs               31664  2 netconsole
xt_tcpudp              12527  6
xt_physdev             12468  26
iptable_filter         12536  1
ip_tables              26011  1 iptable_filter
x_tables 27111 4 xt_physdev,ip_tables,xt_tcpudp,iptable_filter
xen_netback            43986  7
xen_blkback            34328  0
binfmt_misc            16949  1
xen_gntdev             17032  2
xen_evtchn             12783  8
xenfs                  12687  1
xen_privcmd            12868  17 xenfs
bridge                106102  0
8021q                  27844  0
garp                   13117  1 8021q
stp                    12437  2 garp,bridge
mrp                    17343  1 8021q
llc                    12745  3 stp,garp,bridge
bonding               124989  0
psmouse                99249  0
serio_raw              12849  0
ttm                    77862  0
drm_kms_helper         49210  0
drm                   249955  2 ttm,drm_kms_helper
coretemp               12820  0
pcspkr                 12595  0
evdev                  17445  6
lpc_ich                20768  0
mfd_core               12601  1 lpc_ich
i2c_i801               16965  0
ipmi_si                48709  0
ipmi_msghandler        39917  1 ipmi_si
tpm_tis                17231  0
tpm                    31511  1 tpm_tis
ioatdma                57654  0
button                 12944  0
shpchp                 31121  0
i7core_edac            22278  0
edac_core              51465  2 i7core_edac
processor              28221  0
thermal_sys            27642  1 processor
loop                   26605  0
autofs4                35529  2
hid_generic            12393  0
usbhid                 44460  0
hid                   102264  2 hid_generic,usbhid
ext4                  473802  1
crc16                  12343  1 ext4
mbcache                17171  1 ext4
jbd2                   82413  1 ext4
dm_mod                 89405  35
raid1                  34596  1
md_mod                107672  2 raid1
sg                     29973  0
sd_mod                 44356  8
crc_t10dif             12431  1 sd_mod
crct10dif_generic      12581  1
crct10dif_common       12356  2 crct10dif_generic,crc_t10dif
crc32c_intel           21809  0
ahci                   33291  5
libahci                27158  1 ahci
libata                177457  2 ahci,libahci
scsi_mod              191405  3 sg,libata,sd_mod
ehci_pci               12512  0
uhci_hcd               43499  0
ehci_hcd               69837  1 ehci_pci
usbcore               195340  4 uhci_hcd,ehci_hcd,ehci_pci,usbhid
igb                   171872  0
usb_common             12440  1 usbcore
i2c_algo_bit           12751  1 igb
i2c_core               46012  5 drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit
dca                    13168  2 igb,ioatdma
ptp                    17692  1 igb
pps_core               17225  1 ptp


-- Package-specific info:
** Version:
Linux servername 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux

** Command line:
placeholder root=UUID=b1b3521d-d4a0-4fc6-abd7-cd85edf64758 ro quiet {,splash}

--
Tristan Charbonneau
Domisys


Reply to: