Your message dated Thu, 4 Feb 2010 02:09:12 +0100 with message-id <20100204010912.GD2665@stro.at> and subject line Re: disk failures during access on SATA drives, Xen only has caused the Debian Bug report #406581, regarding disk failures during access on SATA drives, Xen only to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 406581: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=406581 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: disk failures during access on SATA drives, Xen only
- From: martin f krafft <madduck@debian.org>
- Date: Fri, 12 Jan 2007 00:45:18 +0100
- Message-id: <20070111234518.GA20066@lapse.madduck.net>
Package: linux-image-2.6.18-3-xen-amd64 Version: 2.6.18-7 Severity: important Our Xen test machine has two SATA controllers 01:05.0 Mass storage controller: Promise Technology, Inc. PDC20375 (SATA150 TX2plus) (rev 02) 01:08.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) (rev 02) and a total of three SATA drives (all SAMSUNG SP2004C) connected to it. Two are connected to the SATA378 controller (the second one, which is onboard), and the third is connected to the SATA150 one (which is a PCI card). The system is an AMD Opteron, running etch and native amd64. The three drives each hold 8 partitions, which are turned into 8 RAID arrays, two RAID1 and 6 RAID5. dmesg output right after boot is attached. So are lspci, cpuinfo and mdstat. Please contact me for more information. I will be away from the system for the next couple of weeks, but it'll be running the non-Xen kernel and be accessible, and if needed, I can get a colleague to do work on it for you. The problem occurs sporadically, but only when booting the Xen kernel. I have not once managed to reproduce it with the 2.6.18-3-amd64 kernel. I can reproduce it with the 2.6.18-3-xen-amd64 kernel more or less at will. It seems that disk activity triggers it. For instance, booting and letting a RAID5 spanned across the three disks resynchronise almost always causes the problem to appear. This is what the log says in such a case: kernel: ata3: command timeout kernel: ata3: no sense translation for status: 0x40 kernel: ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00 kernel: ata3: status=0x40 { DriveReady } kernel: sd 2:0:0:0: SCSI error: return code = 0x08000002 kernel: sdb: Current: sense key: Aborted Command kernel: Additional sense: No additional sense information kernel: end_request: I/O error, dev sdb, sector 48044091 kernel: raid5:md4: read error not correctable (sector 41425248 on sdb7). kernel: raid5: Disk failure on sdb7, disabling device. Operation continuing on 1 devices kernel: raid5:md4: read error not correctable (sector 41425256 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425264 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425272 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425280 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425288 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425296 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425304 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425312 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425320 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425328 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425336 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425344 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425352 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425360 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425368 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425376 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425384 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425392 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425400 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425408 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425416 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425424 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425432 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425440 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425448 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425456 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425464 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425472 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425480 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425488 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425496 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425504 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425512 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425520 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425528 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425536 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425544 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425552 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425560 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425568 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425576 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425584 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425592 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425600 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425608 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425616 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425624 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425632 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425640 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425648 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425656 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425664 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425672 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425680 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425688 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425696 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425704 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425712 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425720 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425728 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425736 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425744 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425752 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425760 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425768 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425776 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425784 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425792 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425800 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425808 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425816 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425824 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425832 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425840 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425848 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425856 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425864 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425872 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425880 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425888 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425896 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425904 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425912 on sdb7). kernel: raid5:md4: read error not correctable (sector 41425920 on sdb7). kernel: ata4: command timeout kernel: ata4: no sense translation for status: 0x40 kernel: ata4: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00 kernel: ata4: status=0x40 { DriveReady } kernel: sd 3:0:0:0: SCSI error: return code = 0x08000002 kernel: sdc: Current: sense key: Aborted Command kernel: Additional sense: No additional sense information kernel: end_request: I/O error, dev sdc, sector 48043483 kernel: raid5: Disk failure on sdc7, disabling device. Operation continuing on 1 devices Note that the disk and controller will change. Once it's ata3/4 and sdb/c, at other times it's ata1/3 and sda/b. The disks themselves have no SMART errors. For instance, here's another instance: kernel: ata4: command timeout kernel: ata4: no sense translation for status: 0x40 kernel: ata4: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00 kernel: ata4: status=0x40 { DriveReady } kernel: sd 3:0:0:0: SCSI error: return code = 0x08000002 kernel: sdc: Current: sense key: Aborted Command kernel: Additional sense: No additional sense information kernel: end_request: I/O error, dev sdc, sector 56772315 kernel: raid5: Disk failure on sdc7, disabling device. Operation continuing on 2 devices kernel: ata1: command timeout kernel: ata1: no sense translation for status: 0x40 kernel: ata1: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00 kernel: ata1: status=0x40 { DriveReady } kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002 kernel: sda: Current: sense key: Aborted Command kernel: Additional sense: No additional sense information kernel: end_request: I/O error, dev sda, sector 56772907 kernel: raid5:md4: read error not correctable (sector 50154064 on sda7). kernel: raid5: Disk failure on sda7, disabling device. Operation continuing on 1 devices kernel: raid5:md4: read error not correctable (sector 50154072 on sda7). kernel: raid5:md4: read error not correctable (sector 50154080 on sda7). kernel: raid5:md4: read error not correctable (sector 50154088 on sda7). kernel: raid5:md4: read error not correctable (sector 50154096 on sda7). kernel: raid5:md4: read error not correctable (sector 50154104 on sda7). kernel: raid5:md4: read error not correctable (sector 50154112 on sda7). kernel: raid5:md4: read error not correctable (sector 50154120 on sda7). kernel: raid5:md4: read error not correctable (sector 50154128 on sda7). kernel: raid5:md4: read error not correctable (sector 50154136 on sda7). kernel: raid5:md4: read error not correctable (sector 50154144 on sda7). Following the above, other partitions will report failures and the system will hardlock. Upon reboot, it's normal again (the RAID recovery restarts), but no data seems to be lost. See below for the list of modules at time of the crash. Note that sata_nv is being loaded (by udev), but there are no additional SATA ports other than the two on-board Promise ports and the two ports on the PCI card. The sata_nv module can be freely removed. Modules loaded: Module Size Used by bridge 63408 0 netloop 11392 0 tun 16256 0 ipv6 285920 18 ipt_MASQUERADE 8320 1 iptable_nat 12292 1 ipt_REJECT 10112 1 xt_tcpudp 7936 22 ipt_addrtype 6528 1 ipt_LOG 11264 1 xt_limit 7424 1 xt_conntrack 7168 6 ip_nat_ftp 8064 0 ip_nat 24492 3 ipt_MASQUERADE,iptable_nat,ip_nat_ftp ip_conntrack_ftp 13136 1 ip_nat_ftp ip_conntrack 63140 6 ipt_MASQUERADE,iptable_nat,xt_conntrack,ip_nat_ftp,ip_nat,ip_conntrack_ftp nfnetlink 11976 2 ip_nat,ip_conntrack iptable_filter 7808 1 ip_tables 25192 2 iptable_nat,iptable_filter x_tables 21896 9 ipt_MASQUERADE,iptable_nat,ipt_REJECT,xt_tcpudp,ipt_addrtype,ipt_LOG,xt_limit,xt_conntrack,ip_tables dm_crypt 16400 0 psmouse 44560 0 serio_raw 12036 0 i2c_nforce2 12544 0 pcspkr 7808 0 shpchp 42028 0 pci_hotplug 20872 1 shpchp i2c_core 27776 1 i2c_nforce2 evdev 15360 0 ext3 138256 6 jbd 65392 1 ext3 mbcache 14216 1 ext3 dm_mirror 25344 0 dm_snapshot 20536 0 dm_mod 62928 5 dm_crypt,dm_mirror,dm_snapshot raid456 123680 7 xor 11024 1 raid456 raid1 27136 2 md_mod 83484 11 raid456,raid1 ide_generic 5760 0 [permanent] sd_mod 25856 27 ide_disk 20736 6 generic 10756 0 [permanent] amd74xx 19504 0 [permanent] ide_core 148224 4 ide_generic,ide_disk,generic,amd74xx sata_promise 18052 24 tulip 57760 0 libata 107040 2 sata_promise scsi_mod 153008 2 sd_mod,libata ehci_hcd 36232 0 ohci_hcd 24964 0 fan 9864 0 -- System Information: Debian Release: 4.0 APT prefers unstable APT policy: (500, 'testing') Architecture: amd64 (x86_64) Shell: /bin/sh linked to /bin/dash Kernel: Linux 2.6.18-3-xen-amd64 Locale: LANG=en_GB, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) -- .''`. martin f. krafft <madduck@debian.org> : :' : proud Debian developer, author, administrator, and user `. `'` http://people.debian.org/~madduck - http://debiansystem.info `- Debian - when you have better things to do than fixing systemsAttachment: dmesg.bz2
Description: Binary dataAttachment: lspci.bz2
Description: Binary dataprocessor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 242 stepping : 10 cpu MHz : 1600.035 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 4001.50 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts ttpPersonalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid5 sda10[0] sdc10[2] sdb10[1] 1991808 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md6 : active raid5 sda9[0] sdc9[2] sdb9[1] 995712 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md5 : active raid5 sda8[0] sdc8[2] sdb8[1] 16000512 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] resync=DELAYED md4 : active raid5 sda7[0] sdc7[3] sdb7[1] 365108992 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_] [>....................] recovery = 0.2% (440960/182554496) finish=192.7min speed=15748K/sec md3 : active raid5 sda6[0] sdc6[2] sdb6[1] 1991808 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md0 : active raid1 sda1[0] sdc1[2] sdb1[1] 64128 blocks [3/3] [UUU] md2 : active raid5 sda5[0] sdc5[2] sdb5[1] 497792 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md1 : active raid1 sda2[0] sdc2[2] sdb2[1] 2000000 blocks [3/3] [UUU] unused devices: <none>Attachment: signature.asc
Description: Digital signature (GPG/PGP)
--- End Message ---
--- Begin Message ---
- To: 406581-done@bugs.debian.org
- Subject: Re: disk failures during access on SATA drives, Xen only
- From: maximilian attems <max@stro.at>
- Date: Thu, 4 Feb 2010 02:09:12 +0100
- Message-id: <20100204010912.GD2665@stro.at>
closing as a bit aged Xen bug wihtout any activity. as we all know they are not yet merged, so not much point in leaving that bug report hanging. happy kvm hacking. thanks for the report anyway -- maks
--- End Message ---