[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#624343: data corruption: md changes max_sector setting of running md devices



On Sun, 6 Jan 2013 12:31:46 +0100 (CET) bug556610@arcor.de wrote:

> Alan Woodland:
> > If I've understood this correctly one possible workaround for this
> > (for the time being) would be to add a boot parameter that lets you
> > artificially limit max_hw_sectors? In this case it seems forcing all
> > md devices down from 248 to 240 would probably avoid potential data
> > loss issues without large performance degradation or big intrusive
> > changes. Is that sane?
> 
> 
> In lieu of a proper upstream bug tracker?: https://bugs.launchpad.net/mdadm/+bug/320638 :

The upstream bug tracker is
   mailto:linux-raid@vger.kernel.org

> Problem: md changes max_sector setting of an already running and busy
> md device, when a (hotplugable) device is added or removed. However, the
> device mapper and filesystem layer on top of the raid can not (always?)
> cope with that.
> 
> Observations:
> * "bio too big device mdX (248 > 240)" messages in the syslog
> * read/write errors (some dropped silently, no noticable errors 
> reported during operation, until things like dhcpclient looses its IP etc.)
> 
> Expected:
> Adding and removing members to running raids (hotplugging) should not
> change the raid device characteristics. If the new member supports only
> smaller max_sector values, buffer and split the data steam, until the raid 
> device can be set up from a clean state with a more appropriate 
> max_sector value. To avoid buffering and splitting in the future, md could 
> save the smallest max_sector value of the known members in the 
> superblock, and use that when setting up the raid even if that member is 
> not present.

This really needs to be fixed by cleaning up the bio path so that big bios
are split by the device that needs the split, not be the fs sending the bio.

I would not be at all happy to have md do the extra buffering and splitting
that you suggest.
Maybe the best interim fix is to reject the added device is its limits are
too low.

NeilBrown


> 
> Note: This is reproducible in much more common scenarios as the 
> original reporter had (e.g. --add a USB (3.0 these days) drive to an 
> already running SATA raid1 and grow the number of devices).

Attachment: signature.asc
Description: PGP signature


Reply to: