From: David Purton <dcpurton@marshwiggle.net>
To: debian-user@lists.debian.org
Cc:
Bcc:
Subject: Re: Disk performance deteriated to unbearable levels
Reply-To:
In-Reply-To: <[🔎] 20111108181428.GB13833@hysteria.proulx.com>
X-GPG-Fingerprint: 2D6A A66E F9DC E86A 876F 062D 16D7 EA32 EE08 09EC
X-GPG-Public-Key: http://marshwiggle.net/~dcpurton/pubkey.asc
X-URL: http://marshwiggle.net/~dcpurton/
Hi Bob,
Thanks for your detailed answer!
On Tue, Nov 08, 2011 at 11:14:28AM -0700, Bob Proulx wrote:
> David Purton wrote:
> > Everything takes forever to load (including booting), but then runs ok
> > once loaded.
>
> Could DMA be disabled now? Taking a long time to read initially but
> running okay afterward would match that symptom. Because after the
> initial read it should be in filesystem buffer cache.
hdparm neither lets me get nor set the dma mode (HDIO_SET_DMA failed:
Inappropriate ioctl for device).
But I think DMA is enabled on the disk. From dmesg:
[ 1.808090] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.809113] ata1.00: unexpected _GTF length (8)
[ 1.809432] ata1.00: ATA-8: Hitachi HTS545025B9A300, PB2OC60N, max UDMA/133
[ 1.809439] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 1.810564] ata1.00: unexpected _GTF length (8)
[ 1.810883] ata1.00: configured for UDMA/133
[ 1.811163] scsi 0:0:0:0: Direct-Access ATA Hitachi HTS54502 PB2O PQ: 0 ANSI: 5
[ 1.823891] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[ 1.824139] sd 0:0:0:0: [sda] Write Protect is off
[ 1.824149] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.824251] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > My only guess is that it is filesystem related, but I am not sure how to
> > confirm this, nor why things would have got to the present situation.
> >
> > Currently, the root/system parition is 20GB, with 50% used. /home has
> > only 44% used.
>
> That seems like a good amount of free space available for the
> filesystem to deal with disk fragmentation.
>
> > Any suggestions?
Ha! I just found some disk related errors in syslog:
Nov 2 12:10:58 swires kernel: [33736.415350] sd 0:0:0:0: [sda] Unhandled error code
Nov 2 12:10:58 swires kernel: [33736.415367] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Nov 2 12:10:58 swires kernel: [33736.415376] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 16 48 77 76 00 01 d0 00
Nov 2 12:10:58 swires kernel: [33736.415395] end_request: I/O error, dev sda, sector 373847926
Nov 2 12:10:58 swires kernel: [33736.415404] Buffer I/O error on device sda7, logical block 15133136
Nov 2 12:10:58 swires kernel: [33736.415409] lost page write due to I/O error on sda7
Nov 2 12:10:58 swires kernel: [33736.415415] Buffer I/O error on device sda7, logical block 15133137
Nov 2 12:10:58 swires kernel: [33736.415420] lost page write due to I/O error on sda7
Nov 2 12:10:58 swires kernel: [33736.415427] Buffer I/O error on device sda7, logical block 15133138
I'm guessing this is bad! :( However, I can find limited details on
Google. Both partitions are seemingly affected, so I guess disk problems
are more likely than file system :(.
*sigh*
> Since other suggested possible hard drive problems... What does
> smartctl say about the health of your drive?
>
> smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
> Any selftest failures? Are you running smartctl selftests? If not
> then please do. I always run selftests regularly to get feedback
> about the drives. Let me suggest something similar to this in
> /etc/smartd.conf so as to have these run automatically.
>
> # Monitor all attributes, enable automatic online data collection,
> # automatic Attribute autosave, and start a short self-test every day
> # between 2-3am, and a long self test Saturdays between 3-4am.
> # On failure run all installed scripts (to send notification email).
> # Ignore attribute 194 temperature change.
> # Ignore attribute 190 airflow temperature change.
> /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner
>
> This will dump the selftests. Any failures?
>
> smartctl -l selftest /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3686 -
>
> If you need to manually run selftests:
>
> smartctl -t short /dev/sda
>
> If short passes pick a time and run:
>
> smartctl -t long /dev/sda
Haven't done this yet.
> You might try using 'hdparm' to produce some data for your disk. Read
> the hdparm documentation first (lots of docs on the web such as this)
>
> http://www.gentoo-wiki.info/Hdparm#Benchmarking_devices
>
> and then you might try this on an otherwise idle system.
As far as I can tell, it's defaults are reasonably optimal.
# hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: Hitachi HTS545025B9A300
Serial Number: 091204PB42061SDBAXTL
Firmware Revision: PB2OC60N
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
Used: unknown (minor revision code 0x0028)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 488397168
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 238475 MBytes
device size with M = 1000*1000: 250059 MBytes (250 GB)
cache/buffer size = 7208 KBytes (type=DualPortCache)
Form Factor: 2.5 inch
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Vendor, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 128
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* NCQ priority information
Non-Zero buffer offsets in DMA Setup FIS
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
In-order data delivery
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
82min for SECURITY ERASE UNIT. 84min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000cca5e8d3409c
NAA : 5
IEEE OUI : 000cca
Unique ID : 5e8d3409c
Checksum: correct
> # hdparm -tT /dev/sda
> Timing cached reads: 4634 MB in 2.00 seconds = 2320.31 MB/sec
> Timing buffered disk reads: 378 MB in 3.01 seconds = 125.71 MB/sec
/dev/sda:
Timing cached reads: 1462 MB in 2.00 seconds = 731.55 MB/sec
Timing buffered disk reads: 250 MB in 3.02 seconds = 82.71 MB/sec
> Lastly you could benchmark the filesystem (a layer on top of the disk
> system) using bonnie/bonnie++.
>
> > Both are ext3
>
> Directly on the disk partition (e.g. /dev/sda5)? Or on top of LVM?
> Or on top of RAID (e.g. /dev/md1)? Or LVM on RAID?
Directly on the disk partition.
> > I do not want to reinstall if at all possible.
>
> I am always an advocate of upgrades not re-installs. :-)
I have a bad feeling about this one :(
David
--
David Purton
dcpurton@marshwiggle.net
For the eyes of the LORD range throughout the earth to
strengthen those whose hearts are fully committed to him.
2 Chronicles 16:9a
Attachment:
signature.asc
Description: Digital signature