[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#422217: linux-image-2.6.20-1-686: SCSI disks initialised too late for mdadm



Stephen Gran wrote:
> This one time, at band camp, Simon A. Boggis said:
>> I've done my experiment with initramfs-tools - putting a 'sleep 10'
>> before mount_root makes my machine boot the kernel, as I suspected in my
>> original email:
>>
>> # diff -u /usr/share/initramfs-tools/init{.orig,}
>> --- /usr/share/initramfs-tools/init.orig        2007-03-07
>> 22:30:42.000000000 +0000
>> +++ /usr/share/initramfs-tools/init     2007-05-11 14:33:55.000000000 +0100
>> @@ -145,6 +145,12 @@
>>  run_scripts /scripts/init-premount
>>  [ "$quiet" != "y" ] && log_end_msg
>>
>> +#SAB>>>>>>
>> +log_begin_msg "SAB: slow SCSI disk discovery workaround: sleeping for
>> 10 seconds"
>> +/bin/sleep 10
>> +log_end_msg
>> +#<<<<<<SAB
>> +
>>  maybe_break mount
>>  log_begin_msg "Mounting root file system..."
>>  . /scripts/${BOOT}
> 
> Not that I'm involved in this in any real way, but things like hardcoded
> sleep timeouts always make me uncomfortable - they introduce delays for
> people who don't need them, and they are racy at best and can still fail
> for the people who do need them.  Is there some way to use udevsettle or
> something instead?  If not, some method of sleep until $disk seems
> better than hardcoding it, to me at least.

I would completely agree with you - it's totally the wrong thing to do -
another SCSI card (or more, or slower devices) could take even longer.
The only reason I did it was to prove (as opposed to guess) that the
problem really is a race between SCSI becoming ready and mount_root.
This has now been shown to be the case, so the next questions are what
is the cause and can it be fixed properly?

Ideally one would like something like (in pseudo-code):

if has_scsi:
  start_scsi_in_blocking_mode
mount_root

or if it won't block then:

if has_scsi
  start_scsi_in_non-blocking_mode
  wait_until_scsi_ready
mount_root

It is interesting that the behaviour is different between 2.6.18 and
2.6.20 - this either implies that SCSI blocked in 2.6.18 or that we were
just lucky and SCSI initialisation won the race. I haven't had time to
work out what might have changed in 2.6.20 yet.

Best wishes,

Simon



Reply to: