[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Strange SCSI errors.



So we're sure it's not the hardware.
Now something strange, almost the same happend some days ago on my workstation. I had two scsi disks attached to it. Then something strange happened (disk seemed to be very busy). I couldn't get any information from it. after a reboot I found out that te partiontable was damaged. My workstation is an x86 based system with a Adaptec 2940 scsi controller. The failing disk was a 20G seagate with scsi-id 12. (the other disk has scsi-id 10) I didn't check the drive yet, but I suspect it to be ok. (smartclt also told me so)
Could be the same problem?
Are all disks attached to the same scsi channel?
Which kernels are those systems using? which filesystems?
(I'm using 2.4.19 with ext3)
I've only got a ss4-110. Can I simulate the problem with that one?

There are more common things between the u2 and e3000...there both 64 bit, isn't it?
Maybe the kernel, libc or any other software with similair versions?

Daniel van Eeden <daniel_e@dds.nl>

Andreas Loong wrote:
[note, this is for the ultra2, which displays the same problem]

try these:
badblocks (read the manpage)

read the manpage, tried the program.. it didn't find anything.
This doesn't seem to be the problem, from what I've experienced.

cat /proc/scsi/<controller>/0

Sparc ESP Host Adapter:
        PROM node               f006347c
        PROM name               SUNW,fas
        ESP Model               Happy Meal FAS
        DMA Revision            Rev HME/FAS


cat /proc/scsi/scsi

nothing unusual here.

scsi-config <dev> (X frontend for scsiinfo)

well, it finds the disks etc, and I can't find any strange values.


and if everything fails this could be a (dirty) solution:
scsiadd -r <scsi_id>
scsiadd -a <scsi_id>

This is not really what I want to do. I'll try to explain the problem better :

Sometimes, after the system has been up and running for a while with a couple of disks attached, I get "Live target 0 not responding" plastered over the console. The disks becomes totally non-responsive and the LED is lit constantly. Nothing gets written to the logs at all. This happened with one disk, and it managed to corrupt my partition table on that disk. I reinstalled on another disk and thought that I don't want to encounter this kind of problem again, I thought it was the disk that was faulty. Now, with a different disk, woody installed on it. Got the latest SMP kernel from the stable tree and started to construct a mirror of two different disks. Then I got hit with the same error message again. A bit odd, had to reboot.. got the mirrors up and running and today I was just about to copy the contents of the root over to the mirror so that I could quietly sit and work on the files that needed some work in order to reflect the changes. While copying, it hung again.

I do not think this is a hardware issue, as it always messes with target 0, no matter what drive is there. The feeling I get after encountered this problem on two different machines is that it is either kernel-based or debian-based. The Ultra2 and the Enterprise 3000 have a few things in common, although one is high-end and the other is rather low-end.
1) Both are SBUS based.
2) Same SCSI chip? I'll check this.
3) Anything else?

Hope this clears up any misunderstandings.

Wbr
Andreas Loong




--
+---------------------------------------------+
|     Daniel van Eeden <daniel_e@dds.nl>      |
| icq: 36952189                               |
| aim: Compukid128                            |
| jabber: compukid@compukid.no-ip.org         |
| msn: daniel_e@dds.nl                        |
| phone: +31 343 522622                       |
| http://compukid.no-ip.org/about_me.html     |
+---------------------------------------------+



Reply to: