[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?



Package: kernel-image-2.6.8-2-386
Version: 2.6.8-16
Severity: important

Hello,

I've been having the same problem on 3 firewall boxes: after a certain
amount of time (days, weeks) the hard drives will either go into read
only mode or lock up for good (until a reboot) with I/O error messages.

I will report about this this machine, as the others have not been
rebooted yet, so they don't work properly yet (although they still
forward/filter packets.)

All the firewalls use Seagate ST92011A (20GB 2.5") drives and are based
on the VIA chipsets (can't confirm if these are identical as one of the
firewalls uses a different motherboard and is currently dead
(input/output error on any command) until the next reboot.

This was a problem when I tried the latest 2.4 kernel in Sarge, then it
seemed to go away when I switched to 2.6.8, but is still there, just
takes much longer for the fault to occur.

I have a feeling it is a Power Management problem, with the drive not
waking up from deep sleep (this was proven experimentally with the 2.4
kernel.) At the moment I am testing the hdparm -B 255 'solution'.
Otherwise it's the 15 min ls -l / > /dev/null cron job :-S

The kernel is from APT the modules loaded are by hotplug - no custom
stuff. powermgmt-base is installed, but that's about it.

Here is some info:

lspci:
---
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo
ProMedia] (rev 05)
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia
AGP]
0000:00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super
South] (rev 40)
0000:00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB
1.1 Controller (rev 1a)
0000:00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB
1.1 Controller (rev 1a)
0000:00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
(rev 40)
0000:00:07.5 Multimedia audio controller: VIA Technologies, Inc.
VT82C686 AC97 Audio Controller (rev 50)
0000:00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
0000:00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
0000:00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
0000:01:00.0 VGA compatible controller: Trident Microsystems
CyberBlade/i1 (rev 6a)
---

Dmesg error from this machine (with a futile attempt to force a remote
reboot - is there a better way?):
---
eth1: no IPv6 routers present
eth0: no IPv6 routers present
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
HTB init, kernel part version 3.17
u32 classifier
    OLD policer on
hdc: dma_timer_expiry: dma status == 0x20
hdc: DMA timeout retry
hdc: timeout waiting for DMA
hdc: status timeout: status=0xd0 { Busy }
 
hdc: drive not ready for command
ide1: reset timed-out, status=0x80
hdc: status timeout: status=0x80 { Busy }
 
hdc: drive not ready for command
ide1: reset timed-out, status=0x80
end_request: I/O error, dev hdc, sector 33719
end_request: I/O error, dev hdc, sector 33727
end_request: I/O error, dev hdc, sector 33735
end_request: I/O error, dev hdc, sector 33743
end_request: I/O error, dev hdc, sector 33751
end_request: I/O error, dev hdc, sector 33759
end_request: I/O error, dev hdc, sector 33767
end_request: I/O error, dev hdc, sector 33775
end_request: I/O error, dev hdc, sector 33783
end_request: I/O error, dev hdc, sector 33791
end_request: I/O error, dev hdc, sector 33799
end_request: I/O error, dev hdc, sector 33807
end_request: I/O error, dev hdc, sector 33815
end_request: I/O error, dev hdc, sector 33823
end_request: I/O error, dev hdc, sector 33831
end_request: I/O error, dev hdc, sector 33839
end_request: I/O error, dev hdc, sector 33847
end_request: I/O error, dev hdc, sector 33855
end_request: I/O error, dev hdc, sector 33863
end_request: I/O error, dev hdc, sector 33871
end_request: I/O error, dev hdc, sector 33879
end_request: I/O error, dev hdc, sector 33887
end_request: I/O error, dev hdc, sector 33895
end_request: I/O error, dev hdc, sector 18982239
Buffer I/O error on device hdc5, logical block 782337
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17250295
Buffer I/O error on device hdc5, logical block 565844
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17250303
Buffer I/O error on device hdc5, logical block 565845
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17250311
Buffer I/O error on device hdc5, logical block 565846
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17249919
Buffer I/O error on device hdc5, logical block 565797
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17245527
Buffer I/O error on device hdc5, logical block 565248
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17966455
Buffer I/O error on device hdc5, logical block 655364
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 21374327
Buffer I/O error on device hdc5, logical block 655364
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 21374327
Buffer I/O error on device hdc5, logical block 1081348
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 7340167
Buffer I/O error on device hdc1, logical block 917513
lost page write due to I/O error on hdc1
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 7340175
Buffer I/O error on device hdc1, logical block 917514
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 7340183
end_request: I/O error, dev hdc, sector 7602311
end_request: I/O error, dev hdc, sector 10765550
end_request: I/O error, dev hdc, sector 10765552
end_request: I/O error, dev hdc, sector 15082871
end_request: I/O error, dev hdc, sector 33903
Aborting journal on device hdc1.
end_request: I/O error, dev hdc, sector 12786023
end_request: I/O error, dev hdc, sector 12786031
end_request: I/O error, dev hdc, sector 12786039
end_request: I/O error, dev hdc, sector 12786047
Aborting journal on device hdc5.
end_request: I/O error, dev hdc, sector 17245527
end_request: I/O error, dev hdc, sector 17250295
end_request: I/O error, dev hdc, sector 17250311
__journal_remove_journal_head: freeing b_committed_data
end_request: I/O error, dev hdc, sector 10765554
Aborting journal on device hdc3.
ext3_abort called.
EXT3-fs abort (device hdc5): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
journal commit I/O error
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_jou
rnal_get_write_access<2>EXT3-fs error (device hdc1) in ext3_reserve_inode_write: 
Journal has aborted
Remounting filesystem read-only
end_request: I/O error, dev hdc, sector 63
EXT3-fs error (device hdc1) in ext3_dirty_inode: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
ext3_abort called.
EXT3-fs abort (device hdc3): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
end_request: I/O error, dev hdc, sector 1572959
printk: 10 messages suppressed.
Buffer I/O error on device hdc1, logical block 196612
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 1572967
Buffer I/O error on device hdc1, logical block 196613
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 1572975
Buffer I/O error on device hdc1, logical block 196614
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 1572983
Buffer I/O error on device hdc1, logical block 196615
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 1572999
Buffer I/O error on device hdc1, logical block 196617
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 1573007
Buffer I/O error on device hdc1, logical block 196618
lost page write due to I/O error on hdc1
end_request: I/O error, dev hdc, sector 1573015
end_request: I/O error, dev hdc, sector 1573023
end_request: I/O error, dev hdc, sector 2359591
end_request: I/O error, dev hdc, sector 3932367
end_request: I/O error, dev hdc, sector 4980871
end_request: I/O error, dev hdc, sector 4980935
end_request: I/O error, dev hdc, sector 7340159
end_request: I/O error, dev hdc, sector 7602279
end_request: I/O error, dev hdc, sector 7602319
end_request: I/O error, dev hdc, sector 8126559
end_request: I/O error, dev hdc, sector 8126567
end_request: I/O error, dev hdc, sector 8651263
end_request: I/O error, dev hdc, sector 8651271
end_request: I/O error, dev hdc, sector 9437279
end_request: I/O error, dev hdc, sector 9437295
end_request: I/O error, dev hdc, sector 9437303
end_request: I/O error, dev hdc, sector 12723551
end_request: I/O error, dev hdc, sector 17179991
end_request: I/O error, dev hdc, sector 17180023
end_request: I/O error, dev hdc, sector 17180031
end_request: I/O error, dev hdc, sector 18752895
end_request: I/O error, dev hdc, sector 10796336
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
<-snip-> More of the same
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
EXT3-fs error (device hdc5) in start_transaction: Journal has aborted
fwbox01:~# dmesh > /dmesg.txt
-bash: /dmesg.txt: Read-only file system
fwbox01:~# mount
/dev/hdc1 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/hdc3 on /home type ext3 (rw)
/dev/hdc5 on /var type ext3 (rw)
/dev/hdc6 on /home/adm type ext3 (rw)
usbfs on /proc/bus/usb type usbfs (rw)
fwbox01:~# dmesg > /home/dmesg.txt
-bash: /home/dmesg.txt: Read-only file system
fwbox01:~# mount
/dev/hdc1 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/hdc3 on /home type ext3 (rw)
/dev/hdc5 on /var type ext3 (rw)
/dev/hdc6 on /home/adm type ext3 (rw)
fwbox01:~# hd
hd      hdparm 
fwbox01:~# hdparm
-bash: /sbin/hdparm: Input/output error
 
fwbox01:/proc/sys/kernel# cat panic
20
fwbox01:/proc/sys/kernel# cat panic_on_oops
0
fwbox01:/proc/sys/kernel# echo 1 > panic_on_oops
 
-r--r--r--   1 root        root                0 2005-10-21 17:24 version
-r--r--r--   1 root        root                0 2005-10-21 17:24 vmstat
fwbox01:/proc# echo diemotherfucker > kcore
fwbox01:/proc#
fwbox01:/proc#
fwbox01:/proc#
fwbox01:/proc# cat /dev/hdc > kcore
cat: /dev/hdc: Input/output error
fwbox01:/proc# cat /dev/random > kcore
fwbox01:/dev# ls *mem
kmem  mem
fwbox01:/dev# cat /dev/zero > kmem
---

The initial DMA error messages appear now and again, and seem to be 
harmless (but also PM related) then it all goes wrong after a few weeks 
running. :-(

I will be happy to privide more specific information, if you tell me 
how to get it. I don't really know much about hardware/kernel debugging.


HTH,

George B.

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.8-2-386
Locale: LANG=en_GB, LC_CTYPE=en_GB (charmap=ISO-8859-1)

Versions of packages kernel-image-2.6.8-2-386 depends on:
ii  coreutils [fileutils]         5.2.1-2    The GNU core utilities
ii  initrd-tools                  0.1.81.1   tools to create initrd image for p
ii  module-init-tools             3.2-pre1-2 tools for managing Linux kernel mo

-- no debconf information



Reply to: