[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#717681: linux-image-3.10-1-amd64: reproducable Data loss with kernel linux-image-3.10-1-amd64 with md-raid devices



On Wed, 24 Jul 2013 04:02:15 +0100 Ben Hutchings <ben@decadent.org.uk> wrote:

> Neil, does the report below sound like the bug you fixed with:
> 
> commit 7bb23c4934059c64cbee2e41d5d24ce122285176
> Author: NeilBrown <neilb@suse.de>
> Date:   Tue Jul 16 16:50:47 2013 +1000
> 
>     md/raid10: fix two problems with RAID10 resync.
> 
> or any of the others you've recently fixed?

Yes.  The bug below sounds exactly like the one fixed by the commit above.

NeilBrown


> 
> On Tue, 2013-07-23 at 20:33 +0200, Thomas Rösch wrote:
> > Package: src:linux
> > Version: 3.10.1-1
> > Severity: normal
> > 
> > Dear Maintainer,
> > 
> > 
> > I have a system with some md raid devices using raid10. When I want to change
> > the partitioning of a harddisk, I set all partitions to fail in the raid and
> > removed then.
> > After the new partitioning was done, I readd the devices and the raid syncs
> > again. After successful syncing (nearly one day) everything looks file and the
> > raid reports no errors.
> > 
> > On the next day, four of raid filesystems are defect and cannot be repaired.
> > The error was something like "illegal entry in ext bitmaps" I search for this
> > error and all says: restore your data ond one says: rewrite alle superblocks
> > with mkfs.ext4 -S and then use fsck, which seems to work, but nearly all data
> > are corrupted and contains spots or areas of zero.
> > 
> > 
> > But when I restore the data from backup (after creating a new ext4 filesystem
> > like before) everything looks fine again - until the next start on the next
> > day. All restored partitions have the same defect as before.
> > I memory check (memtest) do not found any problem, the other filesystems are
> > still ok.
> > 
> > On the next try to restore my data, I see that the kernel is still writing data
> > after reading from my backup medium are finished. Ok, it is flushing the
> > buffers. After a successful call of sync I see that is still continuing writing
> > data. Not like the normal rate at 80-400 MB/s, only at 5-10 MB/s. Another sync
> > returns at once. So for a "full memory cache" of about 10GBytes, it need around
> > 30min to write down all and then the disk writing stops.
> > 
> > If I don't wait, this data is  *not written*, if I shut down the computer
> > regulary. It looks like that these data are not covered by a sync.
> > 
> > When I returned to linux-image-3.9-1-amd64 (which I actually use), I don't see
> > this behavior and all my data I restored are still healthy. Here after a sync
> > all data are written  and I have no problems after restoring data or powering
> > down.
> > 
> > Tom
> [...]
> 

Attachment: signature.asc
Description: PGP signature


Reply to: