[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#391929: more info



retitle 391929 hdparm/sata_promise kernel freeze on 2.6.18.2/amd64 when setting write_cache off
thanks

The problem seems to be related to S07hdparm instead of
S10checkroot and more specifically to the setting write_cache=off
in /etc/hdparm.conf for the two drives attached to the
Promise/FastTrak controller (which is being used not as a HW-RAID
controller, but rather as provider of two separate SATA channels):

lspci -v (full lspci -vv attached):
  00:08.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SAT
  A 378) (rev 02)
    Subsystem: ASUSTeK Computer Inc. K8V Deluxe/PC-DL Deluxe motherboard
    Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 177
    I/O ports at 8800 [size=64]
    I/O ports at 8400 [size=16]
    I/O ports at 8000 [size=128]
    Memory at fb300000 (32-bit, non-prefetchable) [size=4K]
    Memory at fb200000 (32-bit, non-prefetchable) [size=128K]
    Capabilities: [60] Power Management version 2

If I comment out write_cache=off for sd[gh], the system boots fine.
I can also set -W0 on sd[ef], which are connected to a sata_via
controller (see lspci attachment for details).

If I run hdparm -W0 on sdg or sgh, I get a panic, which this time
actually mentions sata_promise:

  Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: 
   [<ffffffff8818c642>] :sata_promise:pdc_eng_timeout+0x62/0x18d
  PGD 35fb2067 PUD 3585d067 PMD 0 
  Oops: 0000 [1] SMP 
  CPU 0 
  Modules linked in: rfcomm l2cap button ac battery ipv6 ipt_MASQUERADE iptable_nat ipt_REJECT ipt_addrtype ipt_LOG xt_limit xt_tcpudp xt_conntrack ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack nfnetlink iptable_filter ip_tables x_tables netconsole snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq snd_via82xx tsdev serio_raw snd_bt87x snd_via82xx_modem snd_ac97_codec snd_pcm_oss snd_mixer_oss evdev snd_mpu401_uart snd_pcm psmouse snd_rawmidi snd_seq_device snd_timer snd soundcore eth1394 pcspkr floppy ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid10 raid1 md_mod ide_generic ide_cd cdrom skge sd_mod hci_usb bluetooth usbhid usb_storage bt878 via82cxxx ohci1394 shpchp pci_hotplug ieee1394 sata_promise sk98lin sata_via aic7xxx scsi_transport_spi bttv video_buf firmware_class ir_common compat_ioctl32 i2c_algo_bit btcx_risc tveeprom videodev v4l1_compat v4l2_common libata scsi_mod generic ide_core uhci_hcd ehci_hcd i2c_viapro i2c_core gameport snd_ac97_bus snd_page_alloc thermal processor fan
  Pid: 1129, comm: scsi_eh_4 Not tainted 2.6.18-2-amd64 #1
  RIP: 0010:[<ffffffff8818c642>]  [<ffffffff8818c642>] :sata_promise:pdc_eng_timeout+0x62/0x18d
  RSP: 0018:ffff81003d86fe40  EFLAGS: 00010096
  RAX: 00000000fafbfcfd RBX: ffff81003e080000 RCX: 000000000000acd4
  RDX: 00000000ffffff01 RSI: 0000000000000046 RDI: ffff81003e3461c0
  RBP: ffff81003e0804e8 R08: ffffffff804dc140 R09: 0000000000000012
  R10: ffff81003d86fe08 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff81003e3461c0 R14: 0000000000000246 R15: 0000000000000005
  FS:  00002ad83985c8c0(0000) GS:ffffffff80520000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000035c4f000 CR4: 00000000000006e0
  Process scsi_eh_4 (pid: 1129, threadinfo ffff81003d86e000, task ffff810037f55770)
  Stack:  ffff81003e080000 ffff81003e0804e8 ffff81003df05ac8 ffff81003e080000
   ffff81003e080000 ffffffff880a4f11 0000000000000282 ffff81003e080000
   ffffffff880767ed ffff81003df05ac8 ffff81003e080000 ffff81003df05ab8
  Call Trace:
   [<ffffffff880a4f11>] :libata:ata_scsi_error+0x418/0x50b
   [<ffffffff880767ed>] :scsi_mod:scsi_error_handler+0x0/0xa81
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff880768ac>] :scsi_mod:scsi_error_handler+0xbf/0xa81
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff880767ed>] :scsi_mod:scsi_error_handler+0x0/0xa81
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff8023055a>] kthread+0xd4/0x107
   [<ffffffff80259318>] child_rip+0xa/0x12
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff80230486>] kthread+0x0/0x107
   [<ffffffff8025930e>] child_rip+0x0/0x12
  
  
  Code: 41 8a 44 24 28 3c 01 74 0d 3c 03 bb e8 03 00 00 0f 85 93 00 
  RIP  [<ffffffff8818c642>] :sata_promise:pdc_eng_timeout+0x62/0x18d
   RSP <ffff81003d86fe40>
  CR2: 0000000000000028
   NMI Watchdog detected LOCKUP on CPU 0
  CPU 0 
  Modules linked in: rfcomm l2cap button ac battery ipv6 ipt_MASQUERADE iptable_nat ipt_REJECT ipt_addrtype ipt_LOG xt_limit xt_tcpudp xt_conntrack ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack nfnetlink iptable_filter ip_tables x_tables netconsole snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq snd_via82xx tsdev serio_raw snd_bt87x snd_via82xx_modem snd_ac97_codec snd_pcm_oss snd_mixer_oss evdev snd_mpu401_uart snd_pcm psmouse snd_rawmidi snd_seq_device snd_timer snd soundcore eth1394 pcspkr floppy ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid10 raid1 md_mod ide_generic ide_cd cdrom skge sd_mod hci_usb bluetooth usbhid usb_storage bt878 via82cxxx ohci1394 shpchp pci_hotplug ieee1394 sata_promise sk98lin sata_via aic7xxx scsi_transport_spi bttv video_buf firmware_class ir_common compat_ioctl32 i2c_algo_bit btcx_risc tveeprom videodev v4l1_compat v4l2_common libata scsi_mod generic ide_core uhci_hcd ehci_hcd i2c_viapro i2c_core gameport snd_ac97_bus snd_page_alloc thermal processor fan
  Pid: 2817, comm: md5_raid10 Not tainted 2.6.18-2-amd64 #1
  RIP: 0010:[<ffffffff8025e8c6>]  [<ffffffff8025e8c6>] .text.lock.spinlock+0x2/0x8a
  RSP: 0018:ffffffff804bfde0  EFLAGS: 00000086
  RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
  RDX: ffffffff804bfe98 RSI: ffff81003e3461c0 RDI: ffff81003e3461c0
  RBP: ffffc20000036000 R08: ffff81003eedc000 R09: 0000000000000246
  R10: 0000000000000000 R11: ffff810037ada770 R12: 0000000000000000
  R13: 00000000000000b1 R14: ffff81003e3461c0 R15: ffffffff804bfe98
  FS:  00002ad83985c8c0(0000) GS:ffffffff80520000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000035c4f000 CR4: 00000000000006e0
  Process md5_raid10 (pid: 2817, threadinfo ffff81003eedc000, task ffff8100011fd870)
  Stack:  ffffffff8818c172 ffffffff80257af1 ffff81003dff2340 0000000000000000
   0000000000000000 00000000000000b1 ffffffff804bfe98 ffffffff804bfe98
   ffffffff8020f0f4 ffffffff80528d00 0000000000005880 00000000000000b1
  Call Trace:
   <IRQ> [<ffffffff8818c172>] :sata_promise:pdc_interrupt+0x3b/0x1d9
   [<ffffffff80257af1>] blk_run_queue+0x28/0x72
   [<ffffffff8020f0f4>] handle_IRQ_event+0x29/0x58
   [<ffffffff802a4302>] __do_IRQ+0xa4/0x105
   [<ffffffff88077dd7>] :scsi_mod:scsi_io_completion+0x156/0x334
   [<ffffffff80263fdf>] do_IRQ+0x65/0x73
   [<ffffffff80258989>] ret_from_intr+0x0/0xa
   [<ffffffff80210376>] __do_softirq+0x53/0xd5
   [<ffffffff8026e567>] end_level_ioapic_vector+0x9/0x16
   [<ffffffff80259664>] call_softirq+0x1c/0x28
   [<ffffffff80264019>] do_softirq+0x2c/0x7d
   [<ffffffff80263fe4>] do_IRQ+0x6a/0x73
   [<ffffffff80258989>] ret_from_intr+0x0/0xa
   <EOI> [<ffffffff8020b1c4>] memcmp+0xb/0x22
   [<ffffffff882a4522>] :raid10:raid10d+0x233/0x9da
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff8025d504>] schedule_timeout+0x1e/0xad
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff8828ac2a>] :md_mod:md_thread+0xf8/0x10e
   [<ffffffff80290358>] autoremove_wake_function+0x0/0x2e
   [<ffffffff8828ab32>] :md_mod:md_thread+0x0/0x10e
   [<ffffffff8023055a>] kthread+0xd4/0x107
   [<ffffffff80259318>] child_rip+0xa/0x12
   [<ffffffff80290195>] keventd_create_kthread+0x0/0x61
   [<ffffffff80230486>] kthread+0x0/0x107
   [<ffffffff8025930e>] child_rip+0x0/0x12
  
  
  Code: 83 3f 00 7e f9 e9 6d fe ff ff e8 ff d7 ff ff e9 7d fe ff ff 
  console shuts up ...
   <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
   <0>Rebooting in 60 seconds..

Curiously, the last two lines do not always appear; sometimes the
system also just remains frozen forever.

sdg is a Maxtor 250Gb SATA drive at UDMA6
sdh is a Samsung 250Gb SATA drive at UDMA7

One difference about these is that the RAID10 array holding the swap
partition only spans sdg[efg] and does not touch sdh.

Both drives are healthy according to smartctl. This is what dmesg
knows about them:

  sata_promise 0000:00:08.0: version 1.04
  ACPI: PCI Interrupt 0000:00:08.0[A] -> GSI 18 (level, low) -> IRQ 177
  ata3: SATA max UDMA/133 cmd 0xFFFFC20000036200 ctl 0xFFFFC20000036238 bmdma 0x0 irq 177
  ata4: SATA max UDMA/133 cmd 0xFFFFC20000036280 ctl 0xFFFFC200000362B8 bmdma 0x0 irq 177
  scsi4 : sata_promise
  ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata3.00: ATA-7, max UDMA/133, 490234752 sectors: LBA48 
  ata3.00: ata3: dev 0 multi count 0
  ata3.00: configured for UDMA/133
  scsi5 : sata_promise
  ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata4.00: ATA-7, max UDMA7, 488397168 sectors: LBA48 NCQ (depth 0/32)
  ata4.00: configured for UDMA/133
    Vendor: ATA       Model: Maxtor 7Y250M0    Rev: YAR5
    Type:   Direct-Access                      ANSI SCSI revision: 05
  SCSI device sdg: 490234752 512-byte hdwr sectors (251000 MB)
  sdg: Write Protect is off
  sdg: Mode Sense: 00 3a 00 00
  SCSI device sdg: drive cache: write back
  SCSI device sdg: 490234752 512-byte hdwr sectors (251000 MB)
  sdg: Write Protect is off
  sdg: Mode Sense: 00 3a 00 00
  SCSI device sdg: drive cache: write back
   sdg: sdg1 sdg2 sdg3 < sdg5 sdg6 sdg7 sdg8 sdg9 sdg10 >
  sd 4:0:0:0: Attached scsi disk sdg
    Vendor: ATA       Model: SAMSUNG SP2504C   Rev: VT10
    Type:   Direct-Access                      ANSI SCSI revision: 05
  SCSI device sdh: 488397168 512-byte hdwr sectors (250059 MB)
  sdh: Write Protect is off
  sdh: Mode Sense: 00 3a 00 00
  SCSI device sdh: drive cache: write through
  SCSI device sdh: 488397168 512-byte hdwr sectors (250059 MB)
  sdh: Write Protect is off
  sdh: Mode Sense: 00 3a 00 00
  SCSI device sdh: drive cache: write through
   sdh: sdh1 sdh2 < sdh5 sdh6 sdh7 sdh8 sdh9 sdh10 >
  sd 5:0:0:0: Attached scsi disk sdh

This bug also occurs with the vanilla 2.6.18.2 kernel.

Hope this helps.

-- 
 .''`.   martin f. krafft <madduck@debian.org>
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems

Attachment: lspci-vv.bz2
Description: Binary data

Attachment: signature.asc
Description: Digital signature (GPG/PGP)


Reply to: