Bug#624343: data corruption: md changes max_sector setting of running md devices

To: david@westcontrol.com, ben@decadent.org.uk, 624343@bugs.debian.org, neilb@suse.de, awoodland@debian.org
Subject: Bug#624343: data corruption: md changes max_sector setting of running md devices
From: bug556610@arcor.de
Date: Sun, 6 Jan 2013 12:31:46 +0100 (CET)
Message-id: <[🔎] 1176538012.134059.1357471906413.JavaMail.ngmail@webmail08.arcor-online.net>
Reply-to: bug556610@arcor.de, 624343@bugs.debian.org

Alan Woodland:
> If I've understood this correctly one possible workaround for this
> (for the time being) would be to add a boot parameter that lets you
> artificially limit max_hw_sectors? In this case it seems forcing all
> md devices down from 248 to 240 would probably avoid potential data
> loss issues without large performance degradation or big intrusive
> changes. Is that sane?


In lieu of a proper upstream bug tracker?: https://bugs.launchpad.net/mdadm/+bug/320638 :
Problem: md changes max_sector setting of an already running and busy
md device, when a (hotplugable) device is added or removed. However, the
device mapper and filesystem layer on top of the raid can not (always?)
cope with that.

Observations:
* "bio too big device mdX (248 > 240)" messages in the syslog
* read/write errors (some dropped silently, no noticable errors 
reported during operation, until things like dhcpclient looses its IP etc.)

Expected:
Adding and removing members to running raids (hotplugging) should not
change the raid device characteristics. If the new member supports only
smaller max_sector values, buffer and split the data steam, until the raid 
device can be set up from a clean state with a more appropriate 
max_sector value. To avoid buffering and splitting in the future, md could 
save the smallest max_sector value of the known members in the 
superblock, and use that when setting up the raid even if that member is 
not present.

Note: This is reproducible in much more common scenarios as the 
original reporter had (e.g. --add a USB (3.0 these days) drive to an 
already running SATA raid1 and grow the number of devices).

Reply to:

Follow-Ups:
- Bug#624343: data corruption: md changes max_sector setting of running md devices
  - From: NeilBrown <neilb@suse.de>

Prev by Date: firmware-realtek: Missing firmware for realtek r8169
Next by Date: Bug#697501: AR9285: enabling or disabling Wi-Fi freezes the system
Previous by thread: Re: firmware-realtek: Missing firmware for realtek r8169
Next by thread: Bug#624343: data corruption: md changes max_sector setting of running md devices
Index(es):
- Date
- Thread