[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

aic7xxx corrupts superblock?



-----------------------------------------------------------------
1.	Summary
----------------------------------------------------------------- 

My 2.0.0 kernel using the aic7xxx driver corrupts the superblock of my
root filesystem (panics at boot, thinking it is an MS-DOS fs).  My
1.2.13 kernel, using aha274x doesn't corrupt it.  I'd like to know
whether this corresponds to a problem fixed in 2.0.n (for n>0), or
what I should do now.

-----------------------------------------------------------------
Contents of this post:

1.	Summary (above)
2.	Hardware/software setup
3.	Fuller description of the problem.
4.	Extracts from /var/adm/messages
-----------------------------------------------------------------

-----------------------------------------------------------------
2.	Hardware/software setup
-----------------------------------------------------------------

486DX-2/66, ASUS VLB
Adaptec 2842 scsi controller
Conner CFP1080S scsi disk
Toshiba scsi CDROM model: XM-3501TA
HP scsi tape: Model: HP35470A
Teac 1.44 floppy
Two kernels:
1.2.13, built using aha274x (aha274x.h v1.11, aha274x.c v1.29)
	custom-built from debian 0.93R6
2.0.0, built using aic7xxx (aic7xxx.h v3.1, aic7xxx.c v3.2)
	custom-built from debian 1.1

The kernel is configured for scsi, scsi disk, scsi tape, and scsi
cdrom support, as well as the appropriate driver.  ISO9660 fs support
is loaded as a module.

-----------------------------------------------------------------
3.	Fuller description of problem
-----------------------------------------------------------------

I upgraded my system from debian 0.93R6 to debian 1.1 (except for the
distribution kernel, which wouldn't boot).  I then built a custom
kernel for my machine, including support for my controller (aic7xxx).
I ran this several times over about four days, and then began to make
installation disks (for another machine), using "dd" to write files
from my cdrom to the floppies.  While "dd" was writing the fourth
disk, I had a kernel panic, which I wrote down (figuring it wouldn't
necessarily get written to /var/adm/messages):

	aic7xxx (aic7xxx_isr) BRKADRINT error (0xff):
	Illegal Host Access
	Illegal Sequencer Address referenced
	Illegal Opcode in sequencer program
	Sequencer RAM parity error
	Kernel panic aic7xxx: (aic7xxx_isr)

Sure enough, it wasn't written to /var/adm/messages, although there is
a message (also echoed to the terminal) each time the cdrom is
mounted (didn't happen under the old kernel):

	ISO9660 Extensions: RRIP_1991A

At this point, I had to do a hard reboot and let fsck fix any damage.
I poked around for clues, and then wrote the remaining installations
disks.  I then got another message:

	ISO9660 Extensions: RRIP_1991A
	scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: 0x08 01 37
		3a 08 0
	Current error sr0b:00: sns = f0  3
	ASC=15 ASCQ= 0
	Raw sense data:0xf0 0x00 0x03 0x17 0x46 0x01 0x00 0x0a 0x00
		0x00 0x00 0x00 0x15 0x00 0x00 0x00 
	CD-ROM I/O error: dev 0b:00, sector 318696

At this point, I rebooted with "shutdown ..." and found that my root
filesystem could not be mounted.  I used my emergency disks to run
fsck, and found that the superblock was corrupted.  I used an
alternate superblock, and the problem (bad free block count) was
fixed.

At this point I found I could boot repeatedly with my old (1.2.13,
aha274x) kernel, with no apparent ill effects.  However, if I booted
with my new (2.0.0, aic7xxx) kernel, I would boot successfully, but
the *next* boot would stop at the partition check, finding the root
filesystem unmountable (and a message suggesting that it is an MS-DOS
fs --- I have no MS-DOS fs!).  The message begins:

	[MS-DOS FS Rel.12, ...

I tried passing "aic7xxx=extended,no_reset" at boot time.  Although
this skipped the scsi bus reset, it didn't seem to solve the problem.

-----------------------------------------------------------------
4.	Extracts from /var/adm/messages
-----------------------------------------------------------------

Here's what the aic7xxx kernel boot sequence looks like.  The "cannot
find map" message seems to be because the custom kernel System.map is
not in /boot (I just noticed that it *is* in the
/usr/src/kernel... directory).
++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Aug  3 13:54:06 riel syslogd 1.3-0#6: restart.
Aug  3 13:54:07 riel kernel: klogd 1.3-0, log source = /proc/kmsg started.
Aug  3 13:54:07 riel syslogd 1.3-0#6: restart.
Aug  3 13:54:07 riel kernel: Cannot find map file.
Aug  3 13:54:07 riel kernel: Console: 16 point font, 400 scans
Aug  3 13:54:07 riel kernel: Console: colour VGA+ 80x25, 1 virtual console (max 63)
Aug  3 13:54:07 riel kernel: Calibrating delay loop.. ok - 33.18 BogoMIPS
Aug  3 13:54:07 riel kernel: Memory: 14876k/16384k available (656k kernel code, 384k reserved, 468k data)
Aug  3 13:54:07 riel kernel: This processor honours the WP bit even when in supervisor mode. Good.
Aug  3 13:54:07 riel kernel: Swansea University Computer Society NET3.035 for Linux 2.0
Aug  3 13:54:07 riel kernel: NET3: Unix domain sockets 0.12 for Linux NET3.035.
Aug  3 13:54:07 riel kernel: Swansea University Computer Society TCP/IP for NET3.034
Aug  3 13:54:07 riel kernel: IP Protocols: ICMP, UDP, TCP
Aug  3 13:54:07 riel kernel: Checking 386/387 coupling... Ok, fpu using exception 16 error reporting.
Aug  3 13:54:07 riel kernel: Checking 'hlt' instruction... Ok.
Aug  3 13:54:07 riel kernel: Linux version 2.0.0 (root@riel) (gcc version 2.7.2) #1 Tue Jul 30 19:35:51 PDT 1996
Aug  3 13:54:07 riel kernel: Serial driver version 4.13 with no serial options enabled
Aug  3 13:54:07 riel kernel: tty00 at 0x03f8 (irq = 4) is a 16550A
Aug  3 13:54:07 riel kernel: tty01 at 0x02f8 (irq = 3) is a 16550A
Aug  3 13:54:07 riel kernel: Ramdisk driver initialized : 16 ramdisks of 4096K size
Aug  3 13:54:07 riel kernel: Floppy drive(s): fd0 is 1.44M
Aug  3 13:54:07 riel kernel: Started kswapd v 1.4.2.2 
Aug  3 13:54:07 riel kernel: FDC 0 is a post-1991 82077
Aug  3 13:54:07 riel kernel: aic7xxx: Reading SEEPROM...done.
Aug  3 13:54:07 riel kernel: aic7xxx: Extended translation disabled.
Aug  3 13:54:07 riel kernel: aic7xxx: AHA-2840 Rev E and subsequent.
Aug  3 13:54:07 riel kernel: aic7xxx: Using 4 SCB's after checking for SCB memory.
Aug  3 13:54:07 riel kernel: aic7xxx: Using level sensitive interrupts.
Aug  3 13:54:07 riel kernel: AHA-2840 AT VLB SLOT 1:
Aug  3 13:54:07 riel kernel:     irq 11
Aug  3 13:54:07 riel kernel:     bus release time 40 bclks
Aug  3 13:54:07 riel kernel:     data fifo threshold 100%
Aug  3 13:54:07 riel kernel:     SCSI CHANNEL A:
Aug  3 13:54:07 riel kernel:         scsi id 7
Aug  3 13:54:07 riel kernel:         scsi selection timeout 256 ms
Aug  3 13:54:07 riel kernel:         scsi bus reset at power-on enabled
Aug  3 13:54:07 riel kernel:         scsi bus parity enabled
Aug  3 13:54:07 riel kernel:         scsi bus termination (low byte) enabled
Aug  3 13:54:07 riel kernel: aic7xxx: Downloading sequencer code...done.
Aug  3 13:54:07 riel kernel: aic7xxx: Resetting the SCSI bus...done.
Aug  3 13:54:07 riel kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 3.2/3.1/3.0
Aug  3 13:54:07 riel kernel: scsi : 1 host.
Aug  3 13:54:07 riel kernel: aic7xxx: Scanning channel A for devices.
Aug  3 13:54:07 riel kernel: aic7xxx: Target 0, channel A, now synchronous at 4.0MHz, offset(0xf).
Aug  3 13:54:07 riel kernel:   Vendor: TOSHIBA   Model: CD-ROM XM-3501TA  Rev: 3384
Aug  3 13:54:07 riel kernel:   Type:   CD-ROM                             ANSI SCSI revision: 02
Aug  3 13:54:07 riel kernel: Detected scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0
Aug  3 13:54:07 riel kernel: aic7xxx: Target 3, channel A, now synchronous at 5.0MHz, offset(0xf).
Aug  3 13:54:07 riel kernel:   Vendor: HP        Model: HP35470A          Rev: T503
Aug  3 13:54:07 riel kernel:   Type:   Sequential-Access                  ANSI SCSI revision: 02
Aug  3 13:54:07 riel kernel: Detected scsi tape st0 at scsi0, channel 0, id 3, lun 0
Aug  3 13:54:07 riel kernel: aic7xxx: Target 6, channel A, now synchronous at 10.0MHz, offset(0xf).
Aug  3 13:54:07 riel kernel:   Vendor: CONNER    Model: CFP1080S          Rev: 3939
Aug  3 13:54:07 riel kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
Aug  3 13:54:07 riel kernel: Detected scsi disk sda at scsi0, channel 0, id 6, lun 0
Aug  3 13:54:07 riel kernel: scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total.
Aug  3 13:54:07 riel kernel: SCSI device sda: hdwr sector= 512 bytes. Sectors= 2110812 [1030 MB] [1.0 GB]
Aug  3 13:54:07 riel kernel: Partition check:
Aug  3 13:54:07 riel kernel:  sda: sda1 sda2 sda3 < sda5 sda6 sda7 > sda4
Aug  3 13:54:07 riel kernel: VFS: Mounted root (ext2 filesystem) readonly.
Aug  3 13:54:07 riel kernel: Adding Swap: 32764k swap-space
Aug  3 13:54:07 riel kernel: CSLIP: code copyright 1989 Regents of the University of California
Aug  3 13:54:07 riel kernel: PPP: version 2.2.0 (dynamic channel allocation)
Aug  3 13:54:07 riel kernel: PPP Dynamic channel allocation code copyright 1995 Caldera, Inc.
Aug  3 13:54:07 riel kernel: PPP line discipline registered.
Aug  3 13:54:07 riel kernel: lp1 at 0x0378, (polling)
Aug  3 14:03:04 riel syslogd: exiting on signal 15
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Here's what the boot for the aha274x looks like.  There's also a
message about "cannot find map" but I can't fix this one as easily
because I attempted to rebuild this kernel with my new elf-ized
setup.  The build failed (maybe 1.2.13 can't be elf-ish), but I had
cleaned up the area (make mrproper).
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Aug  3 14:56:05 riel syslogd 1.3-0#6: restart.
Aug  3 14:56:05 riel kernel: klogd 1.3-0, log source = /proc/kmsg started.
Aug  3 14:56:06 riel syslogd 1.3-0#6: restart.
Aug  3 14:56:06 riel kernel: Cannot find map file.
Aug  3 14:56:06 riel kernel: Console: colour EGA+ 80x25, 1 virtual console (max 63)
Aug  3 14:56:06 riel kernel: Calibrating delay loop.. ok - 33.55 BogoMips
Aug  3 14:56:06 riel kernel: Serial driver version 4.11 with no serial options enabled
Aug  3 14:56:06 riel kernel: tty00 at 0x03f8 (irq = 4) is a 16550A
Aug  3 14:56:06 riel kernel: tty01 at 0x02f8 (irq = 3) is a 16550A
Aug  3 14:56:06 riel kernel: Floppy drive(s): fd0 is 1.44M
Aug  3 14:56:06 riel kernel: FDC 0 is a post-1991 82077
Aug  3 14:56:06 riel kernel: aha274x: extended translation disabled
Aug  3 14:56:06 riel kernel: AHA284X AT SLOT 1:
Aug  3 14:56:06 riel kernel:     irq 11
Aug  3 14:56:06 riel kernel:     bus release time 40 bclks
Aug  3 14:56:06 riel kernel:     data fifo threshold 100%
Aug  3 14:56:06 riel kernel:     SCSI CHANNEL A:
Aug  3 14:56:06 riel kernel:         scsi id 7
Aug  3 14:56:06 riel kernel:         scsi bus parity check enabled
Aug  3 14:56:06 riel kernel:         scsi selection timeout 256 ms
Aug  3 14:56:06 riel kernel:         scsi bus reset at power-on enabled
Aug  3 14:56:06 riel kernel: scsi0 : Adaptec AHA274x/284x (EISA/VL-bus -> Fast SCSI) 1.28/1.11/1.29
Aug  3 14:56:06 riel kernel: scsi : 1 host.
Aug  3 14:56:06 riel kernel: aha274x: target 0 now synchronous at 4.0Mb/s
Aug  3 14:56:06 riel kernel:   Vendor: TOSHIBA   Model: CD-ROM XM-3501TA  Rev: 3384
Aug  3 14:56:06 riel kernel:   Type:   CD-ROM                             ANSI SCSI revision: 02
Aug  3 14:56:06 riel kernel: Detected scsi CD-ROM sr0 at scsi0, id 0, lun 0
Aug  3 14:56:06 riel kernel: aha274x: target 3 now synchronous at 5.0Mb/s
Aug  3 14:56:06 riel kernel:   Vendor: HP        Model: HP35470A          Rev: T503
Aug  3 14:56:06 riel kernel:   Type:   Sequential-Access                  ANSI SCSI revision: 02
Aug  3 14:56:06 riel kernel: Detected scsi tape st0 at scsi0, id 3, lun 0
Aug  3 14:56:06 riel kernel: aha274x: target 6 now synchronous at 10.0Mb/s
Aug  3 14:56:06 riel kernel:   Vendor: CONNER    Model: CFP1080S          Rev: 3939
Aug  3 14:56:06 riel kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
Aug  3 14:56:06 riel kernel: Detected scsi disk sda at scsi0, id 6, lun 0
Aug  3 14:56:06 riel kernel: scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total.
Aug  3 14:56:06 riel kernel: SCSI Hardware sector size is 512 bytes on device sda
Aug  3 14:56:06 riel kernel: Memory: 15072k/16384k available (604k kernel code, 384k reserved, 324k data)
Aug  3 14:56:06 riel kernel: This processor honours the WP bit even when in supervisor mode. Good.
Aug  3 14:56:06 riel kernel: Swansea University Computer Society NET3.019
Aug  3 14:56:06 riel kernel: Swansea University Computer Society TCP/IP for NET3.019
Aug  3 14:56:06 riel kernel: IP Protocols: ICMP, UDP, TCP
Aug  3 14:56:06 riel kernel: Checking 386/387 coupling... Ok, fpu using exception 16 error reporting.
Aug  3 14:56:06 riel kernel: Checking 'hlt' instruction... Ok.
Aug  3 14:56:06 riel kernel: Linux version 1.2.13 (root@riel) (gcc version 2.6.3) #2 Thu Jul 11 20:26:36 PDT 1996
Aug  3 14:56:06 riel kernel: Partition check:
Aug  3 14:56:06 riel kernel:   sda: sda1 sda2 sda3 < sda5 sda6 sda7 > sda4
Aug  3 14:56:06 riel kernel: VFS: Mounted root (ext2 filesystem) readonly.
Aug  3 14:56:06 riel kernel: Adding Swap: 32764k swap-space
Aug  3 14:56:06 riel kernel: CSLIP: code copyright 1989 Regents of the University of California
Aug  3 14:56:06 riel kernel: PPP: version 2.2.0 (dynamic channel allocation)
Aug  3 14:56:06 riel kernel: PPP Dynamic channel allocation code copyright 1995 Caldera, Inc.
Aug  3 14:56:06 riel kernel: PPP line discipline registered.
Aug  3 14:56:06 riel kernel: lp1 at 0x0378, using polling driver
Aug  3 15:00:15 riel syslogd: exiting on signal 15
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Any ideas on how to fix this?  Thanks in advance.


--------------------------------------------------------------------
Danny Heap, UCSF, 3333 California St., Room 102, SF CA, 94122
danny@maxwell.ucsf.edu, voice:	(415) 476-8910, fax: (415) 476-1508
--------------------------------------------------------------------



Reply to: