[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#498228: linux-image-2.6.24-etchnhalf: Horrible RAID iowait problem



On Mon, Sep 08, 2008 at 11:19:54AM +0100, Doug Winter wrote:
> Package: linux-image-2.6.24-etchnhalf
> Severity: normal
> 
> 
> We upgraded to this kernel on a Dell PowerEdge server with a Dell PERC 6
> RAID controller.  This uses the megaraid_sas module.
> 
> Under even small amounts of load the RAID then generated so much iowait
> the system was unusable.  This was easily repeatable and very seriously
> nasty.  We experienced no problems under the previous 2.6.18-6 kernel.
> 
> You can see an example from top here:
> 
> Tasks: 127 total,   1 running, 126 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 74.3%id, 25.7%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:   8307748k total,  3476056k used,  4831692k free,    10332k buffers
> Swap:  2000084k total,        0k used,  2000084k free,   793932k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>     1 root      20   0  1940  632  540 S    0  0.0   0:01.48 init
>     2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
>     3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0
>     4 root      15  -5     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0
>     5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0
>     6 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/1
>     7 root      15  -5     0    0    0 S    0  0.0   0:00.00 ksoftirqd/1
>     8 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1
> 
> iowait got as high as 70% on a basically idle system.
> 
> Here's some output from iostat:
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
> sda               0.00    69.65  0.00  0.00     0.00     0.00     0.00    78.34    0.00   0.00  99.50
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            6.94    0.00    0.25   34.58    0.00   58.23
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
> sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00   159.00    0.00   0.00 100.00
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            1.64    0.00    0.19   48.96    0.00   49.21
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
> sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00   158.56    0.00   0.00  99.50
> 
> Again, whilst basically idle the utilisation is showing as 99.5%.  Utilisation at times went well over 100%, to perhaps 120%.
> 
> We've gone back to the previous kernel and now experience no problems.

Did you upgrade to Lenny in the mean time? If so, does the problem
persist?

For Etch, you should stick with the 2.6.18, we won't make further
changes to the 2.6.24 kernel except for security fixes and critical
bugfixes.

Cheers,
        Moritz



Reply to: