[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error "can't convert block across chunks or bigger than 64k"



On Mon, 2010-11-22 at 11:48 +0100, Wouter D'Haeseleer wrote:
> Package: linux-image-2.6.26-2-xen-686
> Version: 2.6.26-25lenny1
> Severity: critical
> Justification: causes serious data loss
> 
> When accessing an lv using configured on a raid10 using xen results in corrupted data as the following syslog indicates:
> kernel: raid10_make_request bug: can't convert block across chunks or bigger than 64k 309585274 4
> 
> Continued attempts to use the disk in the domU results in i/o error and
> the partition being remounted read-only. 
> 
> see also debian bug 461644 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461644)
> Since this bug is old and closed without a fix, I want to open a new bug for it.
> 
> Redhat made a patch for the appropriate driver, but its not included upstream.
> Can someone please make sure this patch gets into the sources of the debian fork.
> See this patch: https://bugzilla.redhat.com/attachment.cgi?id=342638&action=diff

We much prefer to use bug fixes that have been accepted upstream.

> See also this kernel trap related discussion :
> http://kerneltrap.org/mailarchive/linux-raid/2010/3/8/6837883
> This same thread contains an other patch then the redhat one and its
> that one is also confirmed as working.

Well that was also not accepted upstream.  However, I eventually tracked
down the accepted version, which for future reference is:

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71
Author: NeilBrown <neilb@suse.de>
Date:   Mon Mar 8 16:44:38 2010 +1100

    md: deal with merge_bvec_fn in component devices better.

I have attempted to adjust this for Debian's stable kernel version
(2.6.26) and the result is attached.  Please could you test this,
following the instructions at
<http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
From ea1cddfe4cad61b43b6551ebc6bef466b25ff128 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 8 Mar 2010 16:44:38 +1100
Subject: [PATCH] md: deal with merge_bvec_fn in component devices better.

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
[bwh: Backport to Linux 2.6.26]
---
 drivers/md/linear.c    |   13 ++++++++-----
 drivers/md/multipath.c |   20 ++++++++++++--------
 drivers/md/raid0.c     |   14 ++++++++------
 drivers/md/raid1.c     |   30 +++++++++++++++++++-----------
 drivers/md/raid10.c    |   30 +++++++++++++++++++-----------
 5 files changed, 66 insertions(+), 41 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index ec921f5..627cd38 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -136,12 +136,15 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit ->max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit max_segments to 1 lying within
+		 * a single page.
 		 */
-		if (rdev->bdev->bd_disk->queue->merge_bvec_fn &&
-		    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-			blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);
+		if (rdev->bdev->bd_disk->queue->merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev->queue, 1);
+			blk_queue_max_hw_segments(mddev->queue, 1);
+			blk_queue_segment_boundary(mddev->queue,
+						   PAGE_CACHE_SIZE - 1);
+		}
 
 		disk->size = rdev->size;
 		conf->array_size += rdev->size;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index e968116..0e84b4f 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -293,14 +293,16 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			blk_queue_stack_limits(mddev->queue, q);
 
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit ->max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit ->max_segments to one, lying
+		 * within a single page.
 		 * (Note: it is very unlikely that a device with
 		 * merge_bvec_fn will be involved in multipath.)
 		 */
-			if (q->merge_bvec_fn &&
-			    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-				blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);
+			if (q->merge_bvec_fn) {
+				blk_queue_max_segments(mddev->queue, 1);
+				blk_queue_segment_boundary(mddev->queue,
+							   PAGE_CACHE_SIZE - 1);
+			}
 
 			conf->working_disks++;
 			mddev->degraded--;
@@ -453,9 +455,11 @@ static int multipath_run (mddev_t *mddev)
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
-		if (rdev->bdev->bd_disk->queue->merge_bvec_fn &&
-		    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-			blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);
+		if (rdev->bdev->bd_disk->queue->merge_bvec_fn) {
+			blk_queue_max_segments(mddev->queue, 1);
+			blk_queue_segment_boundary(mddev->queue,
+						   PAGE_CACHE_SIZE - 1);
+		}
 
 		if (!test_bit(Faulty, &rdev->flags))
 			conf->working_disks++;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 914c04d..806e20d 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -141,14 +141,16 @@ static int create_strip_zones (mddev_t *mddev)
 		blk_queue_stack_limits(mddev->queue,
 				       rdev1->bdev->bd_disk->queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit ->max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit ->max_segments to 1, lying within
+		 * a single page.
 		 */
 
-		if (rdev1->bdev->bd_disk->queue->merge_bvec_fn &&
-		    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-			blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);
-
+		if (rdev1->bdev->bd_disk->queue->merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev->queue, 1);
+			blk_queue_max_hw_segments(mddev->queue, 1);
+			blk_queue_segment_boundary(mddev->queue,
+						   PAGE_CACHE_SIZE - 1);
+		}
 		if (!smallest || (rdev1->size <smallest->size))
 			smallest = rdev1;
 		cnt++;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c610b94..0c00872 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1109,13 +1109,18 @@ static int raid1_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 
 			blk_queue_stack_limits(mddev->queue,
 					       rdev->bdev->bd_disk->queue);
-			/* as we don't honour merge_bvec_fn, we must never risk
-			 * violating it, so limit ->max_sector to one PAGE, as
-			 * a one page request is never in violation.
+			/* as we don't honour merge_bvec_fn, we must
+			 * never risk violating it, so limit
+			 * ->max_segments to one lying with a single
+			 * page, as a one page request is never in
+			 * violation.
 			 */
-			if (rdev->bdev->bd_disk->queue->merge_bvec_fn &&
-			    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-				blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);
+			if (rdev->bdev->bd_disk->queue->merge_bvec_fn) {
+				blk_queue_max_phys_segments(mddev->queue, 1);
+				blk_queue_max_hw_segments(mddev->queue, 1);
+				blk_queue_segment_boundary(mddev->queue,
+							   PAGE_CACHE_SIZE - 1);
+			}
 
 			p->head_position = 0;
 			rdev->raid_disk = mirror;
@@ -1971,12 +1976,15 @@ static int run(mddev_t *mddev)
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit ->max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit ->max_segments to 1 lying within
+		 * a single page, as a one page request is never in violation.
 		 */
-		if (rdev->bdev->bd_disk->queue->merge_bvec_fn &&
-		    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-			blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);
+		if (rdev->bdev->bd_disk->queue->merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev->queue, 1);
+			blk_queue_max_hw_segments(mddev->queue, 1);
+			blk_queue_segment_boundary(mddev->queue,
+						   PAGE_CACHE_SIZE - 1);
+		}
 
 		disk->head_position = 0;
 	}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a71277b..010d632 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1135,13 +1135,18 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 
 			blk_queue_stack_limits(mddev->queue,
 					       rdev->bdev->bd_disk->queue);
-			/* as we don't honour merge_bvec_fn, we must never risk
-			 * violating it, so limit ->max_sector to one PAGE, as
-			 * a one page request is never in violation.
+			/* as we don't honour merge_bvec_fn, we must
+			 * never risk violating it, so limit
+			 * ->max_segments to one lying with a single
+			 * page, as a one page request is never in
+			 * violation.
 			 */
-			if (rdev->bdev->bd_disk->queue->merge_bvec_fn &&
-			    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-				mddev->queue->max_sectors = (PAGE_SIZE>>9);
+			if (rdev->bdev->bd_disk->queue->merge_bvec_fn) {
+				blk_queue_max_phys_segments(mddev->queue, 1);
+				blk_queue_max_hw_segments(mddev->queue, 1);
+				blk_queue_segment_boundary(mddev->queue,
+							   PAGE_CACHE_SIZE - 1);
+			}
 
 			p->head_position = 0;
 			rdev->raid_disk = mirror;
@@ -2107,12 +2112,15 @@ static int run(mddev_t *mddev)
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit ->max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit max_segments to 1 lying
+		 * within a single page.
 		 */
-		if (rdev->bdev->bd_disk->queue->merge_bvec_fn &&
-		    mddev->queue->max_sectors > (PAGE_SIZE>>9))
-			mddev->queue->max_sectors = (PAGE_SIZE>>9);
+		if (rdev->bdev->bd_disk->queue->merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev->queue, 1);
+			blk_queue_max_hw_segments(mddev->queue, 1);
+			blk_queue_segment_boundary(mddev->queue,
+						   PAGE_CACHE_SIZE - 1);
+		}
 
 		disk->head_position = 0;
 	}
-- 
1.7.2.3

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: