[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#897561: marked as done (linux-image-4.15.0-3-amd64 - ext3/4 filesystem corruption)



Your message dated Sat, 2 Jun 2018 23:10:37 +0200
with message-id <CAJZTz9wc3SRnij2tdtGpu1y7u8OKkZW2SX2MrUaZWNYYHOtD2Q@mail.gmail.com>
and subject line linux-image-4.15.0-3-amd64 - ext3/4 filesystem corruption
has caused the Debian Bug report #897561,
regarding linux-image-4.15.0-3-amd64 - ext3/4 filesystem corruption
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
897561: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897561
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-4.15.0-3-amd64
Version: 4.15.17-1

It looks like on a larger ext3 file system when writing new files and some conditions are met, the ext4 subsystem thinks the fs is corrupted.
I haven't encountered any problems with smaller fs's yet ( / is usually less than 20 GB on my machines and no problems on such partitions yet).
I don't have similarly large "real" ext4 file systems, only this ext3, so I couldn't test it if it is really ext3 only.
The raw SMART values of my raid1 array's disks are OK, I also did a few passes with "echo check > /sys/block/md1/md/sync_action" just to be sure none of the disks corrupted my data silently (each pass took around 4 hours).

I'll leave some sample error messages at the end of this report, as they are rather long and repeating. Basically they say:
EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm <some program's name>: bg <smaller number>: block <larger number>: invalid block bitmap

On 4.15.0-3, extracting some rar files where the extracted data should be below 1 GB, the free space (hundreds of Gbytes) disappeared quickly, and dmesg showed the above error message repeating, so I had to kill the rar process. The same happens if I scp files to it; basically anything that results in writes can trigger it. If I umount and run e2fsck, it only finds that the free space count is wrong and fixes it. If the system is running on 4.15.0-3, the error is repeatable after reboots, poweroffs.

With the previous 4.15.0-2 (4.15.11-1) kernel, the repaired file system can be used without errors, so I'm using that for the time being.

The fs in question was created as ext3 some years ago; on boot the kernels print this which is expected and normal:
EXT4-fs (md1): mounting ext3 file system using the ext4 subsystem
EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: acl

Information from /proc/mdstat :
md1 : active raid1 sda3[0] sdb3[1]
      1926249424 blocks super 1.2 [2/2] [UU]

df -T:
Filesystem     Type  1K-blocks       Used Available Use% Mounted on
/dev/md1       ext3 1910942232 1521788612 389153620  80% /storage


I think the error is in fs/ext4/balloc.c , but I can be mistaken - the error message comes from there. This file received a patch recently: ext4-add-validity-checks-for-bitmap-block-numbers.patch , but skimming over it I don't see what it could have changed to cause the errors - although I'm no kernel developer.


About the filesystem from the head of dumpe2fs (notice, the inode size is 128 instead of the more common 256, and sparse_super was used; back then I liked to use these options for various reasons):

dumpe2fs 1.44.1 (24-Mar-2018)
Filesystem volume name:   HPg8storage
Last mounted on:          /storage
Filesystem UUID:          0275a7f2-53ea-48fb-8c32-a3a9b0c25406
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              120397824
Block count:              481562356
Reserved block count:     0
Free blocks:              97288373
Free inodes:              116239195
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      909
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   256
RAID stride:              1
RAID stripe width:        1
Filesystem created:       Fri Feb 13 19:39:29 2015
Last mount time:          Wed May  2 21:05:27 2018
Last write time:          Wed May  2 21:07:37 2018
Mount count:              1
Maximum mount count:      -1
Last checked:             Wed May  2 20:46:29 2018
Check interval:           15552000 (6 months)
Next check after:         Mon Oct 29 19:46:29 2018
Lifetime writes:          1867 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      7f682190-7662-445e-9e5d-ff0fc9ab3104
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x000211a9
Journal start:            0

A sample of the error messages from kernel.log:

May  1 12:45:57 hpg8 kernel: [15952.265682] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5289: block 173315497: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.330241] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5290: block 173348266: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.363247] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5291: block 173381035: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.388234] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5292: block 173413804: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.427087] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5293: block 173446573: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.456162] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5294: block 173479342: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.494708] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5295: block 173512111: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.525483] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5296: block 173544880: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.558537] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5297: block 173577649: invalid block bitmap
May  1 12:45:57 hpg8 kernel: [15952.594656] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5298: block 173610418: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.283883] EXT4-fs error: 137 callbacks suppressed
May  1 12:46:02 hpg8 kernel: [15957.283886] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5436: block 178132540: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.313168] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5437: block 178165309: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.363793] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5438: block 178198078: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.413221] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5439: block 178230847: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.443555] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5440: block 178263616: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.477881] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5441: block 178296385: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.512082] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5442: block 178329154: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.552511] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5443: block 178361923: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.581757] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5444: block 178394692: invalid block bitmap
May  1 12:46:02 hpg8 kernel: [15957.619541] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5445: block 178427461: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.304781] EXT4-fs error: 136 callbacks suppressed
May  1 12:46:07 hpg8 kernel: [15962.304783] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5582: block 182916814: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.342431] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5583: block 182949583: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.383918] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5584: block 182982352: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.441696] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5585: block 183015121: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.476039] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5586: block 183047890: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.516458] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5587: block 183080659: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.544928] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5588: block 183113428: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.582115] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5589: block 183146197: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.612453] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5590: block 183178966: invalid block bitmap
May  1 12:46:07 hpg8 kernel: [15962.648108] EXT4-fs error (device md1): ext4_validate_block_bitmap:401: comm rar: bg 5591: block 183211735: invalid block bitmap
...

--- End Message ---
--- Begin Message ---
After upgrading to linux-image-4.16.0-1-amd64 , the file system can be used without problems.

I think the fix was debian/patches/bugfix/all/ext4-fix-bitmap-position-validation.patch
Package: linux-image-4.16.0-1-amd64
Version: 4.16.5-1


--- End Message ---

Reply to: