Bug#462229: marked as done (sata - sata_nv - sata link fails on heavy load)

To: Moritz Muehlenhoff <jmm@inutil.org>
Subject: Bug#462229: marked as done (sata - sata_nv - sata link fails on heavy load)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Wed, 07 Oct 2009 20:27:17 +0000
Message-id: <[🔎] handler.462229.D462229.12549463014352.ackdone@bugs.debian.org>
References: <20091007201138.GC5445@inutil.org> <20080123105540.198350@gmx.net>

Your message dated Wed, 7 Oct 2009 22:11:38 +0200
with message-id <20091007201138.GC5445@inutil.org>
and subject line Re: linux-image-2.6.26-1-686: Additional info and still occuring
has caused the Debian Bug report #462229,
regarding sata - sata_nv - sata link fails on heavy load
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
462229: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462229
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: submit@bugs.debian.org
Subject: sata - sata_nv - sata link fails on heavy load
From: hoover@gmx.at
Date: Wed, 23 Jan 2008 11:55:40 +0100
Message-id: <20080123105540.198350@gmx.net>

Package: base
Severity: critical
Justification: causes serious data loss



-- System Infomation:
Dabian Release: etch
  APT prefers: stable
  APT policy: (1001, 'stable')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18 customized
Locale: LANG=de_AT.UTF-8, LC_CTYPE=de_AT.UTF-8 (charmap=UTF-8)

Motherboard: ASUS M2NPV-MX
Chipset: NFORCE-MCP51, chipset revision 161

libata version 2.00
sata_nv 0000:00:0e:0: version 2.0


I encountered two strange problems concerning my SATA-drives.

Chapter I)

One SAMSUNG SP084N PATA drive (/) [hda]
One SAMSUNG SP2004C Rev: VM10 / 05 SATA drive (payload) [sda] sata1
 -> Using LVM2 (2.02.06-4) on non / partitions

On every boot I have this message, but I think this is only
showing there is no more drive attached?!? If so it is a
little confusing ...

-----------
ata2: SATA link down (SStatus 0 SContorl 300)
ATA: abnormal status 0x7F on port 0x977
	Vendor: ATA	Model SP2004C
	Type: Direct-Access	ANSI SCSI reversion: 05
-----------


sda1 (LVM) is used by samba. As I tried to restore the data
(about 90 GB) via network (GBit) from a windows backup client
to the new debian server, 33% were copied without problems.
Then problems occurred:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0x35 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
-----------
ata1.00: soft resetting port
ata1.00: limiting speed to UDMA/66
ata1.00: configured for UDMA/66
ata1.00: sd 0:0:0:0: SCSI error: return code = 0x08000002
------------
ata1.00: end_request: I/O error, dev sda, sector 31464335
ata1.00: printk: 127 messages suppressed
ata1.00: Buffer I/O error on device dm-6, logical block 3932986
ata1.00: lost page write due to I/O error on dm-6
sata1: EH complete
------------
sata1.00 speed down requested but no transfer mode left

The transfer rate went down to 0,01 kb/secs, and the filesystem was
unrepairable destroyed.

I tried this three times (new fs etc. etc.). After the third attempt
I was able to repair the fs and I let the current data on the drive
because i thought the drive is corrupted on these certian places.
After this I was able to copy all the data, no more problems occurred
on this day, but a few days after, the same situations came out.


Because of these problemes I 'went' to


Chapter II)

One SAMSUNG SP084N PATA drive (/) [hda]
One SAMSUNG SP2004C Rev: VM10 / 05 SATA drive (payload) [sda] sata1
 -> Using LVM2 (2.02.06-4) on non / partitions
One SEAGATE ST3250410AS Rev: 3.AA /05 [sdb] sata2
One SEAGATE ST3250410AS Rev: 3.AA /05 [sdc] sata3
 -> sdb1 und sdc2 in a RAID1-Array (without LVM)


On every boot I have this message, but I think this is only
showing there is no more drive attached?!? I so it is a
little confusing ... compare with Chapter I)

-----------
ata4: SATA link down (SStatus 0 SContorl 300)
ATA: abnormal status 0x7F on port 0x967
	Vendor: ATA	Model: ST3250410AS Rev: 3.AA
	Type: Direct-Acess	ANSI SCSI revision: 05
-----------

So I copied all the formerly backuped data from hda and a
windows backup client to the new created raid1-array (90 GB).

Everything went fine, at least with sdb and sdc.
But I got these messages in the time I copied the data form
hda and network to /dev/md0 (sdb and sbc).

-----------
ata1: port is slow to respond, please be patient
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through
ata1: port is slow to respond, please be patient
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through
-----------

ATTENTION!!! sda was not involved in this 'thing' it was only
mounted (there were no open files). Therefore the were also
no data losses.

So it seems that sata_nv (or maybe the mainboard) has a problem
with one (?) of the sata ports on heavy load. Again: all
situations came out on heavy loads (copying from multiple
sources to 'one' destination).
Otherwise I could not understand why I get these errors
on sata1/sda without doing something on it?!?



Regards,
Anton Huber

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

--- End Message ---

--- Begin Message ---

To: Ben Whyte <ben@whyte-systems.co.uk>

Cc: Debian Bug Tracking System <462229-done@bugs.debian.org>

Subject: Re: linux-image-2.6.26-1-686: Additional info and still occuring

From: Moritz Muehlenhoff <jmm@inutil.org>

Date: Wed, 7 Oct 2009 22:11:38 +0200

Message-id: <20091007201138.GC5445@inutil.org>

In-reply-to: <20090908204403.GA1466@galadriel.inutil.org>

References: <20090207212512.3121.62512.reportbug@thor.whyte-systems.co.uk> <20090908204403.GA1466@galadriel.inutil.org>
On Tue, Sep 08, 2009 at 10:44:03PM +0200, Moritz Muehlenhoff wrote:
> On Sat, Feb 07, 2009 at 09:25:12PM +0000, Ben Whyte wrote:
> > Package: linux-image-2.6.26-1-686
> > Version: 2.6.26-13
> > Followup-For: Bug #462229
> > 
> > 
> > While doing disk rights I can achieve the following
> > 
> > It is consistent across all disks, all ports, all cables.  I have been seeing this issue since june/july and it has cost me significant data 
> > loss and forced 3 reinstalls as the OS has been terminally damaged.
> > 
> > I have tried turning write cache off as it has been mentioned as a pottential fix this has not worked.
> > 
> > Currently effecting 2 brand new wd 1 tb green drives.
> 
> Did you try a more recent kernel than the standard Lenny kernel, e.g. a
> 2.6.30 kernel from backports.org?

No further feedback, closing the bug. If anyone reencounters the problem
more a recent kernel, please reopen.

Cheers,
       Moritz
--- End Message ---

Reply to:

Prev by Date: Re: Kernel version for Lenny 5.1?
Next by Date: Bug#435054: marked as done (linux-image-2.6.18-4-686: modprobe diskonchip unconditionally formatted my mtd device! :()
Previous by thread: Re: Kernel version for Lenny 5.1?
Next by thread: Bug#435054: marked as done (linux-image-2.6.18-4-686: modprobe diskonchip unconditionally formatted my mtd device! :()
Index(es):
- Date
- Thread