[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#764162: linux-image-3.16-2-kirkwood: [regression 3.14->3.16] file data corruption, via network



On Mon, 2014-10-06 at 01:21 +0200, Svenska wrote:
> Package: src:linux
> Version: 3.16.3-2
> Severity: serious
> Justification: may cause silent data corruption
> 
> Hello,
> 
> after upgrading the kernel of my NAS to 3.16-2-kirkwood, I noticed
> corrupt data on my files. The NAS works as a DHCP client on its single
> LAN port, the two hard drives run in a RAID1 configuration. Both
> drives are fine according to mdadm and smartctl. Kernel logs in both
> cases have not been eye-catching. Samba is configured with "unix
> charset = ISO-8859-1".
> 
> When copying data from the NAS to some other device using CIFS or
> Samba (the other devices run jessie/amd64 or wheezy/i686), the files
> are partly corrupt. The reported md5 checksums of files change every
> time the files are read. (And they differ from the checksums as stored
> on disk as well.) The files are not completely destroyed: ZIP archives
> show CRC errors only for some files in them, and video files remain
> generally playable, with lots of audio/video errors and crashing
> players.
> 
> Shell access using SSH generally works. Trying to copy files with scp
> or sftp breaks the connection ("invalid MAC packets" or similar). File
> listing and mounting with sshfs work. Trying to copy files via sshfs
> results in a "Socket not connected" error, and the mount point is
> gone.
> 
> As far as I can see, the on-disk data seems to be fine, as do files
> copied to the NAS while running the corrupt kernel. I am unable to
> check all new files, though - there might have been silent data
> corruption on new files.

It's sounding like the issue might be corruption on the networking path
rather than the disk path then?

> Data corruption on NAS devices is a serious issue for me.
> Reverting back to 3.14-2-kirkwood fixed the problems.

Which exact version did you revert to? Looking at debian/changelog since
the first 3.14 version I don't see much specific to kirkwood. At some
point there were some changes with the network PHY but they predate 3.14
IIRC.

The only thing which jumps out is "[armel/kirkwood] mm: Enable HIGHMEM
(Closes: #760786)" in 3.16.2-1. If you were able to try 3.16-1~exp1[0]
from snapshot.debian.org that might help rule that out.

If 3.16-1~exp1 still has the issue then I think we are looking at an
upstream regression. It might be worth bisecting a bit over the old
kernels from experimental which can be found on snapshot.d.o e.g. to
narrow down if the issue appeared in 3.15 or 3.16.

After that please could you to take this up with the upstream
developers[1], who are normally very responsive to kirkwood issues. 

Thanks,
Ian.

[0] Should be at http://snapshot.debian.org/package/linux/3.16-1~exp1/
[1]
ARM/Marvell Dove/Kirkwood/MV78xx0/Orion SOC support
M:      Jason Cooper <jason@lakedaemon.net>
M:      Andrew Lunn <andrew@lunn.ch>
M:      Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
L:      linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S:      Maintained


> 
> Best Regards
> 
> -- Package-specific info:
> ** Kernel log: boot messages should be attached
> 
> ** Model information
> Hardware	: QNAP TS-119/TS-219
> Revision	: 0000
> 
> ** PCI devices:
> 00:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device 
> [11ab:6282] (rev 01)
> 	Subsystem: Marvell Technology Group Ltd. Device [11ab:11ab]
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
> Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort+ >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 9
> 	Region 0: Memory at <ignored> (64-bit, prefetchable)
> 	Capabilities: <access denied>
> 
> 00:01.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE 
> Controller [197b:2363] (rev 03) (prog-if 01 [AHCI 1.0])
> 	Subsystem: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363]
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
> Stepping- SERR+ FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 9
> 	Region 5: Memory at e0010000 (32-bit, non-prefetchable) [size=8K]
> 	[virtual] Expansion ROM at e0000000 [disabled] [size=64K]
> 	Capabilities: <access denied>
> 	Kernel driver in use: ahci
> 
> 00:01.1 IDE interface [0101]: JMicron Technology Corp. JMB363 SATA/IDE 
> Controller [197b:2363] (rev 03) (prog-if 85 [Master SecO PriO])
> 	Subsystem: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363]
> 	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ 
> Stepping- SERR+ FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
> 	Interrupt: pin B routed to IRQ 9
> 	Region 0: I/O ports at 1010 [disabled] [size=8]
> 	Region 1: I/O ports at 1020 [disabled] [size=4]
> 	Region 2: I/O ports at 1018 [disabled] [size=8]
> 	Region 3: I/O ports at 1024 [disabled] [size=4]
> 	Region 4: I/O ports at 1000 [disabled] [size=16]
> 	Capabilities: <access denied>
> 
> 01:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device 
> [11ab:6282] (rev 01)
> 	Subsystem: Marvell Technology Group Ltd. Device [11ab:11ab]
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
> Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 10
> 	Region 0: Memory at <ignored> (64-bit, prefetchable)
> 	Capabilities: <access denied>
> 
> 01:01.0 USB controller [0c03]: Etron Technology, Inc. EJ168 USB 3.0 Host 
> Controller [1b6f:7023] (rev 01) (prog-if 30 [XHCI])
> 	Subsystem: Etron Technology, Inc. EJ168 USB 3.0 Host Controller [1b6f:7023]
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
> Stepping- SERR+ FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 10
> 	Region 0: Memory at e8000000 (64-bit, non-prefetchable) [size=32K]
> 	Capabilities: <access denied>
> 	Kernel driver in use: xhci_hcd
> 
> 
> ** USB devices:
> Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> 
> 
> -- System Information:
> Debian Release: jessie/sid
>    APT prefers testing
>    APT policy: (500, 'testing')
> Architecture: armel (armv5tel)
> 
> Kernel: Linux 3.14-2-kirkwood
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> 
> Versions of packages linux-image-3.16-2-kirkwood depends on:
> ii  debconf [debconf-2.0]                   1.5.53
> ii  initramfs-tools [linux-initramfs-tool]  0.116
> ii  kmod                                    18-3
> ii  linux-base                              3.5
> ii  module-init-tools                       18-3
> 
> Versions of packages linux-image-3.16-2-kirkwood recommends:
> ii  firmware-linux-free  3.3
> pn  uboot-mkimage        <none>
> 
> Versions of packages linux-image-3.16-2-kirkwood suggests:
> pn  debian-kernel-handbook  <none>
> pn  fdutils                 <none>
> pn  linux-doc-3.16          <none>
> 
> Versions of packages linux-image-3.16-2-kirkwood is related to:
> pn  firmware-atheros        <none>
> pn  firmware-bnx2           <none>
> pn  firmware-bnx2x          <none>
> pn  firmware-brcm80211      <none>
> pn  firmware-intelwimax     <none>
> pn  firmware-ipw2x00        <none>
> pn  firmware-ivtv           <none>
> pn  firmware-iwlwifi        <none>
> pn  firmware-libertas       <none>
> pn  firmware-linux          <none>
> pn  firmware-linux-nonfree  <none>
> pn  firmware-myricom        <none>
> pn  firmware-netxen         <none>
> pn  firmware-qlogic         <none>
> pn  firmware-ralink         <none>
> pn  firmware-realtek        <none>
> pn  xen-hypervisor          <none>
> 
> -- debconf information excluded
> 
> 


Reply to: