Re: scsi partitions (sizes) mixed up during boot (fwd)
On Sun, 5 Jun 2011, Håkan Johansson wrote:
Hi,
(This mail did not make it originally, due to too large attachments? Now
using pastebin instead.)
lspci-v.txt http://pastebin.com/HMXmWmsD
lspci-nn.txt http://pastebin.com/0k4YUDDC
partitions_slc_2.6.9 http://pastebin.com/9GUXzA09
partitions_2.6.26-2 http://pastebin.com/fTzHLzvm
partitions_2.6.32-5 http://pastebin.com/tD6dhXze
dmesg_slc_2.6.9 http://pastebin.com/DmCBYAW4
dmesg_2.6.26-2 http://pastebin.com/vsLHX5Mn
dmesg_2.6.32-5 http://pastebin.com/YgBNZpyA
I have a MS-9245 1U Rackmount Server (dual dual-core opteron) together with a
StorCase SC U320/SATA16R raid box. To this machine, the 16 bay SATA raid box
exports one RAID5 array of 6.6 TB as 4 slices, and another 1 TB part of a
disk as another slice. The main export is in 4 slices as the box cannot
export slices larger than 2 TB. Sizes not exactly the same to easily tell
them apart. The slices appear as four disks (sd[bcde]) and are then glued
together again as a linear md0 array. (see partitions_slc_2.6.9 and
dmesg_slc_2.6.9)
Up to now the machine has been running scientific linux (that is
end-of-lifed) with a 2.6.9 kernel. I now tried to install debian squeeze on
it, but before much noticed that the scsi disks appear with wrong sizes. (see
partitions_2.6.32-5, and dmesg_2.6.32-5)
When trying with the kernel from lenny (2.6.26-2), the partitions seems to
come out right (partitions_2.6.26-2, dmesg_2.6.26-2). As it is a production
system, I have not yet ventured to do harder testing. (I do have a copy of
the data.)
In the dmesg, it looks like the initialisation is taking place for all the
disks on the bus at the same time, which possibly confuses the raid box.
I have tried the option 'scsi_mod.scan=sync', but this did not solve the
issue. (enabled in the attached dmesg)
Suggestions?
Thanks,
Håkan Johansson
Mostly for the record:
The issue can be circumvented with the patch given at
http://kerneltrap.org/mailarchive/linux-scsi/2010/3/28/6894333
--------------------------
--- sd.c_orig 2011-03-23 13:04:47.000000000 -0700
+++ sd.c 2011-03-25 09:58:54.000000000 -0700
@@ -2491,8 +2491,11 @@
dev_set_drvdata(dev, sdkp);
get_device(&sdkp->dev); /* prevent release before async_schedule */
+#if 0
async_schedule(sd_probe_async, sdkp);
-
+#else
+ sd_probe_async(sdkp, (async_cookie_t) 0);
+#endif
return 0;
out_free_index:
---------------------------
The machine has now been working stable with that for almost a month.
Would it possible and make sense to somehow blacklist the asynchronous
probing for this combination of adapter and device?
Cheers,
Håkan
Reply to: