[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Disk performance deteriated to unbearable levels



From: David Purton <dcpurton@marshwiggle.net>
To: debian-user@lists.debian.org
Cc: 
Bcc: 
Subject: Re: Disk performance deteriated to unbearable levels
Reply-To: 
In-Reply-To: <20111108181428.GB13833@hysteria.proulx.com>
X-GPG-Fingerprint: 2D6A A66E F9DC E86A 876F  062D 16D7 EA32 EE08 09EC
X-GPG-Public-Key: http://marshwiggle.net/~dcpurton/pubkey.asc
X-URL: http://marshwiggle.net/~dcpurton/

Hi Bob,

Thanks for your detailed answer!

On Tue, Nov 08, 2011 at 11:14:28AM -0700, Bob Proulx wrote:
> David Purton wrote:
> > Everything takes forever to load (including booting), but then runs ok
> > once loaded.
> 
> Could DMA be disabled now?  Taking a long time to read initially but
> running okay afterward would match that symptom.  Because after the
> initial read it should be in filesystem buffer cache.

hdparm neither lets me get nor set the dma mode (HDIO_SET_DMA failed:
Inappropriate ioctl for device).

But I think DMA is enabled on the disk. From dmesg:

[    1.808090] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    1.809113] ata1.00: unexpected _GTF length (8)
[    1.809432] ata1.00: ATA-8: Hitachi HTS545025B9A300, PB2OC60N, max UDMA/133
[    1.809439] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    1.810564] ata1.00: unexpected _GTF length (8)
[    1.810883] ata1.00: configured for UDMA/133
[    1.811163] scsi 0:0:0:0: Direct-Access     ATA      Hitachi HTS54502 PB2O PQ: 0 ANSI: 5
[    1.823891] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[    1.824139] sd 0:0:0:0: [sda] Write Protect is off
[    1.824149] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.824251] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

> > My only guess is that it is filesystem related, but I am not sure how to
> > confirm this, nor why things would have got to the present situation.
> > 
> > Currently, the root/system parition is 20GB, with 50% used. /home has
> > only 44% used.
> 
> That seems like a good amount of free space available for the
> filesystem to deal with disk fragmentation.
> 
> > Any suggestions?

Ha! I just found some disk related errors in syslog:

Nov  2 12:10:58 swires kernel: [33736.415350] sd 0:0:0:0: [sda] Unhandled error code
Nov  2 12:10:58 swires kernel: [33736.415367] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Nov  2 12:10:58 swires kernel: [33736.415376] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 16 48 77 76 00 01 d0 00
Nov  2 12:10:58 swires kernel: [33736.415395] end_request: I/O error, dev sda, sector 373847926
Nov  2 12:10:58 swires kernel: [33736.415404] Buffer I/O error on device sda7, logical block 15133136
Nov  2 12:10:58 swires kernel: [33736.415409] lost page write due to I/O error on sda7
Nov  2 12:10:58 swires kernel: [33736.415415] Buffer I/O error on device sda7, logical block 15133137
Nov  2 12:10:58 swires kernel: [33736.415420] lost page write due to I/O error on sda7
Nov  2 12:10:58 swires kernel: [33736.415427] Buffer I/O error on device sda7, logical block 15133138


I'm guessing this is bad! :( However, I can find limited details on
Google. Both partitions are seemingly affected, so I guess disk problems
are more likely than file system :(.

*sigh*

> Since other suggested possible hard drive problems...  What does
> smartctl say about the health of your drive?
> 
>   smartctl -H /dev/sda

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

> Any selftest failures?  Are you running smartctl selftests?  If not
> then please do.  I always run selftests regularly to get feedback
> about the drives.  Let me suggest something similar to this in
> /etc/smartd.conf so as to have these run automatically.
> 
>   # Monitor all attributes, enable automatic online data collection,
>   # automatic Attribute autosave, and start a short self-test every day
>   # between 2-3am, and a long self test Saturdays between 3-4am.
>   # On failure run all installed scripts (to send notification email).
>   # Ignore attribute 194 temperature change.
>   # Ignore attribute 190 airflow temperature change.
>   /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner
> 
> This will dump the selftests.  Any failures?
> 
>   smartctl -l selftest /dev/sda

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining      LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3686             -

> 
> If you need to manually run selftests:
> 
>   smartctl -t short /dev/sda
> 
> If short passes pick a time and run:
> 
>   smartctl -t long /dev/sda

Haven't done this yet.

> You might try using 'hdparm' to produce some data for your disk.  Read
> the hdparm documentation first (lots of docs on the web such as this)
> 
>   http://www.gentoo-wiki.info/Hdparm#Benchmarking_devices
> 
> and then you might try this on an otherwise idle system.

As far as I can tell, it's defaults are reasonably optimal.

# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       Hitachi HTS545025B9A300                 
        Serial Number:      091204PB42061SDBAXTL
        Firmware Revision:  PB2OC60N
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
        Used: unknown (minor revision code 0x0028) 
        Supported: 8 7 6 5 
        Likely used: 8
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  488397168
        Logical/Physical Sector size:           512 bytes
        device size with M = 1024*1024:      238475 MBytes
        device size with M = 1000*1000:      250059 MBytes (250 GB)
        cache/buffer size  = 7208 KBytes (type=DualPortCache)
        Form Factor: 2.5 inch
        Nominal Media Rotation Rate: 5400
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Vendor, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Advanced power management level: 128
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
           *    Advanced Power Management feature set
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
           *    IDLE_IMMEDIATE with UNLOAD
           *    WRITE_UNCORRECTABLE_EXT command
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
           *    NCQ priority information
                Non-Zero buffer offsets in DMA Setup FIS
           *    DMA Setup Auto-Activate optimization
                Device-initiated interface power management
                In-order data delivery
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
Security: 
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
                frozen
        not     expired: security count
                supported: enhanced erase
        82min for SECURITY ERASE UNIT. 84min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000cca5e8d3409c
        NAA             : 5
        IEEE OUI        : 000cca
        Unique ID       : 5e8d3409c
Checksum: correct


>   # hdparm -tT /dev/sda
>   Timing cached reads:   4634 MB in  2.00 seconds = 2320.31 MB/sec
>   Timing buffered disk reads: 378 MB in  3.01 seconds = 125.71 MB/sec

/dev/sda:
 Timing cached reads:   1462 MB in  2.00 seconds = 731.55 MB/sec
 Timing buffered disk reads: 250 MB in  3.02 seconds =  82.71 MB/sec



> Lastly you could benchmark the filesystem (a layer on top of the disk
> system) using bonnie/bonnie++.
> 
> > Both are ext3
> 
> Directly on the disk partition (e.g. /dev/sda5)?  Or on top of LVM?
> Or on top of RAID (e.g. /dev/md1)?  Or LVM on RAID?

Directly on the disk partition.

> > I do not want to reinstall if at all possible.
> 
> I am always an advocate of upgrades not re-installs.  :-)

I have a bad feeling about this one :(

David


-- 
David Purton
dcpurton@marshwiggle.net
 
For the eyes of the LORD range throughout the earth to
strengthen those whose hearts are fully committed to him.
                                 2 Chronicles 16:9a

Attachment: signature.asc
Description: Digital signature


Reply to: