Re: High I/O wait times in kernels since 2.6.18

To: debian@musmo.com
Cc: debian-kernel@lists.debian.org
Subject: Re: High I/O wait times in kernels since 2.6.18
From: Svante Signell <svante.signell@telia.com>
Date: Thu, 22 Jan 2009 00:19:37 +0100
Message-id: <1232579977.29435.64.camel@em2.my.own.domain>
In-reply-to: <49771A34.4080008@musmo.com>
References: <1232531265.29435.22.camel@em2.my.own.domain> <49771A34.4080008@musmo.com>

On Wed, 2009-01-21 at 13:51 +0100, KwangErn Liew wrote:
> Svante Signell wrote:
> > Hello,
> > 
> > i have problems with extremely high IO wait times for some operations
> > like apt-get-update and apt-get upgrade. Even checking the disks with
> > hdparm shows this problem. I have not seen this problem with the I/O
> > wait until recent kernels. 
> > 
> > The system disk is rather slow:
> > hdparm -t /dev/hda
> > /dev/hda:
> > Timing buffered disk reads:   26 MB in  3.23 seconds =   8.06 MB/sec
> > 
> > Other disks are faster:
> > hdparm -t /dev/hdb
> > /dev/hdb:
> > Timing buffered disk reads:   70 MB in  3.04 seconds =  23.05 MB/sec
> > 
> > but do also show the extremely long wait times, up to 97% wait and only
> > a few % CPU. The disks have DMA enabled and I have checked the memory
> > for a long time with no problems. 
> 
> <snip>
> 
> > I have tried three CPU schedulers: cfq (default), anticipatory and
> > deadline with small differences. Kernels tried recently are
> > 2.6.18-6-686, 2.6.25-2-682 and 2.6.26-1-686.
> 
> Could be a harddisk failure, have you run any HDD diagnostics? SMART 
> might tell something too.

Thank you for the tip. Something is probably wrong with hda, even though
hdparm does not show any speed problems. (hda is an old 10GB disk while
hdb is a newer 120GB disk so 8MB/sec and 23MB/sec are reasonable).

[    2.804171] hda: IBM-DTTA-351010, ATA DISK drive
[    2.808019] Marking TSC unstable due to: TSC halts in idle.
[    3.087000] hdb: WDC WD1200JB-00CRA1, ATA DISK drive

What does this mean?
[    2.808019] Marking TSC unstable due to: TSC halts in idle.

Using smartctl on hda and hdb gives:
smartctl -s on -t short /dev/hda
smartctl -l selftest /dev/hda
...
Warning: device does not support Self-Test functions.
...

/var/log/kern.log shows:
Jan 21 23:51:12 em2 kernel: [90492.947558] hdb: task_no_data_intr:
status=0x51 { DriveReady SeekComplete Error }
Jan 21 23:51:12 em2 kernel: [90492.947580] hdb: task_no_data_intr:
error=0x04 { DriveStatusError }
Jan 21 23:51:12 em2 kernel: [90492.947586] ide: failed opcode was: 0xb0

Same commands on /dev/hdb gives:
...
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Wed Jan 21 23:49:11 2009

SMART Self-test log structure revision number 1
Num Test_Description Status                       Remaining
LifeTime(hours) LBA_of_first_error
# 1  Short offline   Completed without error 00%  621

Which HDD diagnostic programs are available? I did not find anything by searching with apt-cache.
How to find out which errors are given by opcode 0xb?
The disk still runs OK, can it crash any time without warning? In case
it has to be replaced, any hints on how to move the installed system to
a new disk?? The tar file method would work, right but what about
the /var partition which is written to frequently?

Any ideas about clock timer sources?
2.6.14:
...
Detected 1567.904 MHz processor.
Using tsc for high-res timesource
...
Calibrating delay using timer specific routine.. 3137.48 BogoMIPS
(lpj=1568743)
...

2.6.26:
...
[    0.000000] Detected 1567.779 MHz processor.
...
[    0.000000] ACPI: PM-Timer IO Port: 0x4008
...
[    0.084205] Calibrating delay using timer specific routine.. 3137.89
BogoMIPS (lpj=6275799)

[    0.154492] * Found PM-Timer Bug on the chipset. Due to workarounds
for a bug,
[    0.154495] * this clock source is slow. Consider trying other clock
sources

Thanks,
Svante
> 
> KwangErn

Reply to:

References:
- High I/O wait times in kernels since 2.6.18
  - From: Svante Signell <svante.signell@telia.com>

Prev by Date: Re: vserver recompile zlib.h error
Next by Date: Bug#512617: kernel repeated warnings
Previous by thread: High I/O wait times in kernels since 2.6.18
Next by thread: Bug#429381: Info received (XFS internal error)
Index(es):
- Date
- Thread