[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode



Hi everybody,

I'm really greatful about stumbling upon this issue, because it describes the exact same issue I've been experiencing for a while now.

Basically whenever I upload file/s via. rsync/Firefox/Chromium, within several seconds my entire Linux system crashes. I've experienced this issue on Debian 10, but it also shows up on ArchLinux. In my case the modem in charge is an M.2. module Huawei ME906s (USB ID 12d1:15c1).

I've also tried debugging via. kdump and I've got different kernel errors across multiple crashes and I've tried logging my debugging issue resolving problems on this gist [0].

It doesn't matter if I'm uploading files from a ramfs (/tmp/) or my SATA SSD.

I'm also using modemmanager and network-manager.

I switched ISP and thought the issue was resolved, but I've just tried uploading a file again and it still crashes my Linux 4.17.2-1-ARCH kernel (so I guess this is a Linux and not Debian only related issue).

[0]: https://gist.github.com/norpol/d5b043d6082ace9fc232527d4835f045 or attachment
# Debugging Linux Kernel Crash

## Error description:
Almost everytime I'm uploading a bigger file (65MB in this case) via. my browser (Firefox, build provided by mozilla.org as `.tar.gz`), my system crashes.
Issue especially happens when I'm doing different things at the same time. (Watching a video, reading email + uploading a file). System is using an SSD, bug also appears if the file is served from `/tmp`, though.

-,- | -,-
--- | ---
OS | Debian Testing (Release Buster / 10)
Kernel | `Linux 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux`
CPU | `Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz`
Machine | Thinkpad T560
EFI | `EFI v2.40 by Lenovo, efi:  SMBIOS=0xb705e000  ACPI=0xb7ffe000  ACPI 2.0=0xb7ffe014  MPS=0xb7f48000  ESRT=0xb6aa8000`
Boot method | efistub
storage | Samsung SSD 840 EVO (256GB)`, LUKS (with LVM), rootfs=btrfs, homefs=ext4, cryptswap in LVM`

The issue is persistent for multiple Kernel upgrades, though. Also showed up back when Debian testing was called Stretch.
Issue mostly appears on file uploads via. LTE-modem.

## Actions

- [ ] Intel uCode upgrade didn't help.
- [ ] Vendor BIOS/uEFI upgrade didn't help.
- [ ] Disabling apparmor didn't help.
- [ ] Disabling/chaning IO scheduler didn't help.
- [ ] Reinstalling operating system from Debian => archlinux didn't help
- [ ] Disabling anything power saving related in BIOS, didn't help
  - See [Skylake crash bug arstechnica (2017)](https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/)
  - Basically setting c-state to 1 [might also work](https://askubuntu.com/questions/749349/how-to-set-intel-idle-max-cstate-1)

Installing and setup `kdump-tools` (had to set `/proc/cmdline` => `nmi_watchdog=1`, otherwise kdump failed to load kdump kernel on crash).

## Other

Early bootup BIOS warning: 
```
[  +0.000000] Kernel command line: initrd=\initrd.img root=/dev/mapper/system-root resume=UUID=d9506118-b9e2-49db-9385-f731ef1c8615 ro quiet splash crashkernel=384M nmi_watchdog=1
[  +0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[  +0.000000] Calgary: detecting Calgary via BIOS EBDA area
[  +0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
```

### kdump-tools dmesg error trace

Note: I have multiple crashes, this is the only one containing `[ cut here ]` section.

```
------------[ cut here ]------------
WARNING: CPU: 2 PID: 2206 at /build/linux-K4nuoe/linux-4.14.17/mm/vmacache.c:102 vmacache_find+0x96/0xa0
Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter cdc_mbim cdc_wdm cdc_ncm snd_hrtimer snd_seq snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_conservative wireguard(O) ip6_udp_tunnel udp_tunnel binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2 fscrypto ecb arc4 iwlmvm snd_soc_skl snd_hda_codec_hdmi snd_soc_skl_ipc intel_rapl snd_soc_sst_ipc btusb x86_pkg_temp_thermal snd_soc_sst_dsp intel_powerclamp btrtl mac80211 btbcm snd_hda_ext_core snd_hda_codec_realtek coretemp btintel snd_soc_sst_match efi_pstore snd_hda_codec_generic kvm_intel bluetooth snd_soc_core snd_compress kvm snd_hda_intel irqbypass uvcvideo videobuf2_vmalloc intel_cstate videobuf2_memops intel_uncore videobuf2_v4l2 iwlwifi intel_rapl_perf snd_hda_codec serio_raw wmi_bmof videobuf2_core snd_hda_core efivars rtsx_pci_ms drbg cfg80211 memstick ansi_cprng snd_hwdep cdc_ether option videodev snd_pcm usb_wwan thinkpad_acpi usbnet iTCO_wdt usbserial mei_me snd_timer ecdh_generic nvram mii iTCO_vendor_support media sg crc16 joydev shpchp mei snd soundcore intel_pch_thermal rfkill battery ac evdev nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_counter nft_ct nf_conntrack nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables nfnetlink sunrpc efivarfs ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash algif_skcipher af_alg dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc mmc_core aesni_intel ahci i915 libahci i2c_algo_bit aes_x86_64 e1000e rtsx_pci crypto_simd glue_helper xhci_pci ptp cryptd pps_core drm_kms_helper psmouse i2c_i801 mfd_core xhci_hcd libata usbcore scsi_mod drm usb_common thermal wmi video button
CPU: 2 PID: 2206 Comm: firefox Tainted: G           O    4.14.0-3-amd64 #1 Debian 4.14.17-1
Hardware name: LENOVO [REMOVED], BIOS N1KET21W (1.08 ) 04/20/2016
task: ffff92143f2de000 task.stack: ffffaf9d81dfc000
RIP: 0010:vmacache_find+0x96/0xa0
RSP: 0000:ffffaf9d81dffec0 EFLAGS: 00010207
RAX: ffff921404f23410 RBX: 00007fe232700008 RCX: 0000000000000002
RDX: 0000000000000002 RSI: 00007fe232700008 RDI: ffff92146e447140
RBP: ffff92146e447140 R08: 00007fe250400018 R09: 00000000ffffffff
R10: 00000000ffffffff R11: 00007fe21b600000 R12: ffffaf9d81dfff58
R13: ffff92146e447140 R14: 0000000000000054 R15: ffff92143f2de000
FS:  00007fe251a64740(0000) GS:ffff921481500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe232700008 CR3: 00000001ff380002 CR4: 00000000003606e0
Call Trace:
 find_vma+0x16/0x70
 __do_page_fault+0x172/0x4e0
 ? SyS_read+0x76/0xc0
 ? page_fault+0x36/0x60
 page_fault+0x4c/0x60
RIP: 0033:0x41a0dc
RSP: 002b:00007ffc405b1730 EFLAGS: 00010206
Code: 01 00 48 8b 84 c8 80 04 00 00 48 85 c0 74 11 48 39 78 40 75 16 48 39 30 77 06 48 39 70 08 77 8e 83 c2 01 83 fa 04 75 ce 31 c0 c3 <0f> ff 31 c0 c3 f3 c3 90 90 90 0f 1f 44 00 00 41 54 55 ba ff ff 
---[ end trace 8a3827954d6da8d6 ]---
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP:           (null)
PGD 800000022ac5b067 P4D 800000022ac5b067 PUD 231192067 PMD 0 
SMP PTI
Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter cdc_mbim cdc_wdm cdc_ncm snd_hrtimer snd_seq snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_conservative wireguard(O) ip6_udp_tunnel udp_tunnel binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2 fscrypto ecb arc4 iwlmvm snd_soc_skl snd_hda_codec_hdmi snd_soc_skl_ipc intel_rapl snd_soc_sst_ipc btusb x86_pkg_temp_thermal snd_soc_sst_dsp intel_powerclamp btrtl mac80211 btbcm snd_hda_ext_core snd_hda_codec_realtek coretemp btintel snd_soc_sst_match efi_pstore snd_hda_codec_generic kvm_intel bluetooth snd_soc_core snd_compress
 kvm snd_hda_intel irqbypass uvcvideo videobuf2_vmalloc intel_cstate videobuf2_memops intel_uncore videobuf2_v4l2 iwlwifi intel_rapl_perf snd_hda_codec serio_raw wmi_bmof videobuf2_core snd_hda_core efivars rtsx_pci_ms drbg cfg80211 memstick ansi_cprng snd_hwdep cdc_ether option videodev snd_pcm usb_wwan thinkpad_acpi usbnet iTCO_wdt usbserial mei_me snd_timer ecdh_generic nvram mii iTCO_vendor_support media sg crc16 joydev shpchp mei snd soundcore intel_pch_thermal rfkill battery ac evdev nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_counter nft_ct nf_conntrack nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables nfnetlink sunrpc efivarfs ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash
 algif_skcipher af_alg dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc mmc_core aesni_intel ahci i915 libahci i2c_algo_bit aes_x86_64 e1000e rtsx_pci crypto_simd glue_helper xhci_pci ptp cryptd pps_core drm_kms_helper psmouse i2c_i801 mfd_core xhci_hcd libata usbcore scsi_mod drm usb_common thermal wmi video button
CPU: 1 PID: 3171 Comm: Chrome_~dThread Tainted: G        W  O    4.14.0-3-amd64 #1 Debian 4.14.17-1
Hardware name: LENOVO [REMOVED], BIOS N1KET21W (1.08 ) 04/20/2016
task: ffff92141d6d2040 task.stack: ffffaf9d882cc000
RIP: 0010:          (null)
RSP: 0000:ffff921481483f38 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff92148149cd00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff92148149cd80 RDI: ffffaf9d83b57d08
RBP: ffffaf9d83b57d08 R08: 00000000003d0900 R09: 00000062126e1800
R10: 0000000000000000 R11: 0000000000000001 R12: ffff92148149cd80
R13: 0000000000000000 R14: 0000000000000001 R15: ffff92148149ce28
FS:  00007f3646728700(0000) GS:ffff921481480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000022e71c004 CR4: 00000000003606e0
Call Trace:
 <IRQ>
 ? __hrtimer_run_queues+0xde/0x230
 ? hrtimer_interrupt+0xa6/0x1f0
 ? smp_apic_timer_interrupt+0x66/0x120
 ? apic_timer_interrupt+0x98/0xa0
 </IRQ>
 ? __clear_user+0xe/0x50
 ? copy_fpstate_to_sigframe+0x8e/0x1e0
 ? get_sigframe.isra.13.constprop.14+0x19f/0x1c0
 ? do_signal+0x1ba/0x6b0
 ? force_sig_info_fault+0x97/0xf0
 ? is_prefetch.isra.24+0x91/0x1a0
 ? __bad_area_nosemaphore+0x9b/0x1b0
 ? __do_page_fault+0x37b/0x4e0
 ? page_fault+0x36/0x60
 ? exit_to_usermode_loop+0x6e/0xc0
 ? prepare_exit_to_usermode+0x5e/0x60
 ? retint_user+0x8/0x8
Code:  Bad RIP value.
RIP:           (null) RSP: ffff921481483f38
CR2: 0000000000000000
```

`echo "code: [...] | linux.git/scripts/decodecode` output:

```
Code: 01 00 48 8b 84 c8 80 04 00 00 48 85 c0 74 11 48 39 78 40 75 16 48 39 30 77 06 48 39 70 08 77 8e 83 c2 01 83 fa 04 75 ce 31 c0 c3 <0f> ff 31 c0 c3 f3 c3 90 90 90 0f 1f 44 00 00 41 54 55 ba ff ff
All code
========
   0:	01 00                	add    %eax,(%rax)
   2:	48 8b 84 c8 80 04 00 	mov    0x480(%rax,%rcx,8),%rax
   9:	00 
   a:	48 85 c0             	test   %rax,%rax
   d:	74 11                	je     0x20
   f:	48 39 78 40          	cmp    %rdi,0x40(%rax)
  13:	75 16                	jne    0x2b
  15:	48 39 30             	cmp    %rsi,(%rax)
  18:	77 06                	ja     0x20
  1a:	48 39 70 08          	cmp    %rsi,0x8(%rax)
  1e:	77 8e                	ja     0xffffffffffffffae
  20:	83 c2 01             	add    $0x1,%edx
  23:	83 fa 04             	cmp    $0x4,%edx
  26:	75 ce                	jne    0xfffffffffffffff6
  28:	31 c0                	xor    %eax,%eax
  2a:	c3                   	retq   
  2b:*	0f ff 31             	ud0    (%rcx),%esi		<-- trapping instruction
  2e:	c0 c3 f3             	rol    $0xf3,%bl
  31:	c3                   	retq   
  32:	90                   	nop
  33:	90                   	nop
  34:	90                   	nop
  35:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3a:	41 54                	push   %r12
  3c:	55                   	push   %rbp
  3d:	ba                   	.byte 0xba
  3e:	ff                   	(bad)  
  3f:	ff                   	.byte 0xff

Code starting with the faulting instruction
===========================================
   0:	0f ff 31             	ud0    (%rcx),%esi
   3:	c0 c3 f3             	rol    $0xf3,%bl
   6:	c3                   	retq   
   7:	90                   	nop
   8:	90                   	nop
   9:	90                   	nop
   a:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   f:	41 54                	push   %r12
  11:	55                   	push   %rbp
  12:	ba                   	.byte 0xba
  13:	ff                   	(bad)  
  14:	ff                   	.byte 0xff
```

Reply to: