Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem

To: Roland Mas <lolando@debian.org>
Cc: 653440@bugs.debian.org, control@bugs.debian.org
Subject: Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
From: Frédéric Brière <fbriere@fbriere.net>
Date: Mon, 16 Jan 2012 10:27:47 -0500
Message-id: <[🔎] 20120116152747.GA12066@fabul.fbriere.dyndns.org>
Reply-to: Frédéric Brière <fbriere@fbriere.net>, 653440@bugs.debian.org
In-reply-to: <[🔎] 87ty3wvy0i.fsf@mirexpress.internal.placard.fr.eu.org>
References: <20111228103303.24894.76090.reportbug@mirexpress.placard.fr.eu.org> <[🔎] 20120114192110.GA26246@fabul.fbriere.dyndns.org> <[🔎] 87ty3wvy0i.fsf@mirexpress.internal.placard.fr.eu.org>

retitle 653440 Kernel freezes when assembling RAID1 array with components all write-mostly
found 653440
found 653440 3.1.1-1
tags 653440 fixed-upstream
thanks

[ Note to kernel team: any chance 307729c could make it into 3.2.1-1? ]

On Mon, Jan 16, 2012 at 09:42:53AM +0100, Roland Mas wrote:
> > I think you were hit by d2eb35a, same as me; I'll either comment here or
> > file a new bug, depending on the case.
> 
>   I think you're right.

You probably figured all this out, but to anybody else who is affected
and found this bug report:

The 3.1 (and 3.2) kernel will freeze when assembling a RAID1 array where
all members have the writemostly flag set.  (Unfortunately, kernels
prior to 3.1 make it impossible to remove this flag from the metadata.)

This has been fixed upstream, although not in time for 3.2 (or 3.2.1,
apparently).  The fix (307729c) does apply cleanly to 3.1.8 and 3.2.1,
though; I've attached it if you don't want to stick to 3.0 for the time
being.

> I seem to recall that it is not exactly stored on-disk,

Same here; I added the writemostly flag on a SSD partition by mistake
and had to turn it off at boot time since.

>   The irony is, I was looking forward to a 3.1 kernel precisely because
> it is said to make the writemostly state persist on disk :-)

It does indeed.  After booting with a patched 3.1.8, the flag was
automatically updated and I was able to get rid of the "echo" hack.
(At this point, the unpatched 3.1.8 boots fine, but that would leave me
vulnerable if the SSD were to fail.)

-- 
... but hey, this is Linux, isn't it meant to do infinite loops in 5
seconds?
		-- Jonathan Oxer in the apt-cacher ChangeLog

>From 307729c8bc5b5a41361af8af95906eee7552acb1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 9 Jan 2012 01:41:51 +1100
Subject: [PATCH] md/raid1: perform bad-block tests for WriteMostly devices
 too.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.

With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.

This bug was introduced in commit d2eb35acfdccbe2 and so the patch
is suitable for 3.1.x and 3.2.x

Reported-and-tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reported-and-tested-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org
---
 drivers/md/raid1.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc24f0c..a368db2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -531,8 +531,17 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0)
+			if (best_disk < 0) {
+				if (is_badblock(rdev, this_sector, sectors,
+						&first_bad, &bad_sectors)) {
+					if (first_bad < this_sector)
+						/* Cannot use this */
+						continue;
+					best_good_sectors = first_bad - this_sector;
+				} else
+					best_good_sectors = sectors;
 				best_disk = disk;
+			}
 			continue;
 		}
 		/* This is a reasonable device to use.  It might
-- 
1.7.8.3

Reply to:

Follow-Ups:
- Processed: Re: Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
  - From: owner@bugs.debian.org (Debian Bug Tracking System)

References:
- Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
  - From: Frédéric Brière <fbriere@fbriere.net>
- Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
  - From: Roland Mas <lolando@debian.org>

Prev by Date: linux-2.6_2.6.32-41_multi.changes ACCEPTED into proposed-updates
Next by Date: Processed: Re: Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
Previous by thread: Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
Next by thread: Processed: Re: Bug#653440: linux-image-3.1.0-1-686-pae: Fails to boot since 3.1, apparently in the md subsystem
Index(es):
- Date
- Thread