[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RESOLVED: Re: 3ware 9550 SATA RAID controller problems



Hi all,

I finally got around to figuring out a fix for this problem and wanted to post a follow up in case someone else has this problem.

The fix is to add the disk device driver modules to the yaird config file. In my case I added to /etc/yaird/Default.cfg

MODULE	ata_piix
MODULE	3w_9xxx

following the existing module defs for the input devices. I then rebuilt the initrd.img with:


$ yaird -o /boot/initrd.img-test  # based on the current running kernel
$ cd /boot
$ rm initrd.img   # a symlink, otherwise move it
$ ln -s initrd.img-test initrd.img
$ reboot

If your system has a problem booting, then reboot to you backup kernel, (You DO have a backup kernel, don't you? otherwise you are on your own).

Then remove the symlink and link in your original initrd.img and reboot. You want to change this back so when you make changes to your config you are rebuilding against the current running kernel config not your backup, or you can read man yaird and try other options and your on your own again ;)

I tried to load JUST the 3w_9xxx device above, but that reordered the drives and initrd then tried to mount root from the wrong disk, so it is important that you load all the disk devices in the same order as your running kernel or you will get unexpected results.

I hope this helps someone else.

-Steve

Stephen Woodbridge wrote:
Interesting, I have found another list thread in Germany with this same problem only I can't read much German. The two suggestions I was able to make out were:

1) try adding a sleep in the boot scripts to give the raid array more time to initialize at startup.

2) effectively the same, by setting the fstab to "noauto" and adding appropriate /etc/rc?.d/S99mountraid scripts to check the array and mount it.

Unfortunately, I could not determine if either of these were successful. The German thread is at https://mlists.in-berlin.de/pipermail/linux-l/msg51879.html

My other thought was to load the 3w-9xxx sooner in /etc/modules so the raid would be recognized and may initialized sooner while the other modules were loading. I'll give this a try as soon as I can reboot.

-Steve

Stephen Woodbridge wrote:

Hi all,

I have not been able to find anything useful on the problem described below. I could really use some ideas or suggestions.

The one thing I have not tried is to build the driver based on the 3ware source, but I have not had much luck building a kernel that works :( I'm willing to try that again if I can get some detailed steps to do that, maybe based on modifying the existing linux-image-2.6.15-1-em64t-p4-smp and config.

Thanks,
  -Steve

Stephen Woodbridge wrote:

OK, things are very close to completely working.

Running the sid linux-image-2.6.15-1-em64t-p4-smp

Have the ATAPI CDROM working :)
Recognizing the 3ware 9550 raid card.
Have built an ext3 filesystem and it mounts and read writes seem to be fine.

BUT there is a problem on reboot.
The fsck check of the /dev/sdb1 fails to read the superblock and throws the boot process into maintenance mode. The device is fine and fsck reports that it is clean. and a ^D whether I fsck it or not brings up the system and the disk is fine.

I seems like it might be a timing problem in that the array is not fully online before the system goes to check it. Or I didn't do something right when I partitioned and made the filesystem.


Boot messages are:

...
scsi2  : 3ware 9000 Storage Controller
3w-9xxx: scsi2: Found a 3ware 9000 Storage Controller at 0xda200000, IRQ: 18.
input: ImPs ...
3w-9xxx: scsi2: Firmware FE9X 3/02.00.012, BIOS BE9X 3.01.00.024, Ports 8.
  Vendor: AMCC Model: 9550X-8LP DISK Rev 3.02
  Type: Direct-Access   ANSI SCSI revision: 03
SCSI device sdb: 3417817088 512-byte hdwr sectors (1749922 MB)
SCSI device sdb: drive cache: none
SCSI device sdb: 3417817088 512-byte hdwr sectors (1749922 MB)
SCSI device sdb: drive cache: none
  sdb: sdb1
sd 2:0:0:0: Attached scsi disk sdb
All modules loaded.
Checking all file systems...
fsck 1.37 (21-Mar-2005)
fsck.ext3: No such file or directory while trying to open /dev/sdb1
/dev/sdb1:
The superblock could not be read or does not describe a correct ext2 filesystem.
... [and suggests trying]
   e2fsck -b 8193 <device>

and prompts for the root password to enter maintenence mode.


So trying the above reports:

$ e2fsck -b 8193 /dev/sdb1
e2fsck 1.37 (21-Mark-2005)
e2fsck: Bad magic number in super-block while trying to open /dev/sdb1
The superblock could not be read or does not describe a correct ext2 filesystem.
... [and suggests trying]
   e2fsck -b 8193 <device>

(none):~# fsck -v /dev/sdb1
fsck 1.37 (21-Mark-2005)
e2fsck 1.37 (21-Mark-2005)
/dev/sdb1: clean, 11/213614592 files, 6711696/427226577 blocks

(none):~# fsck -v -f /dev/sdb1
  [checks the disk and reports no problems]

^D

system continues to boot and everything looks fine.

I think the key to this problem is likely the error message:

fsck.ext3: No such file or directory while trying to open /dev/sdb1


above. Is it not finding the executable fsck.ext3 or the device /dev/sdb1??

fsck.ext3 is in the /sbin directory with its ilk.

-Steve









Reply to: