[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#647185: linux-2.6: kernel null pointer dereference while adding SAN path



Package: linux-2.6
Version: 2.6.32-38

Hi,

removing paths to our SAN and adding them back results in

[  951.569561] device-mapper: table: 253:2: sde too small for target: start=0, len=140465493850188, dev_size=627107840
[  951.571750] BUG: unable to handle kernel NULL pointer dereference at (null)
[  951.571876] IP: [<(null)>] (null)
[  951.571961] PGD 6500c1067 PUD 650135067 PMD 0 
[  951.578673] Oops: 0010 [#1] SMP 
[  951.578788] last sysfs file: /sys/devices/virtual/block/dm-3/uevent
[  951.578846] CPU 16 
[  951.578928] Modules linked in: 8021q garp stp ext4 jbd2 crc16 dm_round_robin dm_multipath scsi_dh bonding ipmi_devintf ipmi_si ipmi_msghandler ohci_hcd radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core snd_pcm snd_timer snd soundcore snd_page_alloc hpilo hpwdt joydev pcspkr psmouse evdev serio_raw power_meter container processor button ext3 jbd mbcache dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod sd_mod crc_t10dif sg usbhid sr_mod hid cdrom ata_generic hpsa ata_piix thermal uhci_hcd cciss ehci_hcd qla2xxx scsi_transport_fc libata scsi_tgt bnx2 usbcore qlcnic nls_base scsi_mod thermal_sys [last unloaded: scsi_wait_scan]
[  951.581772] Pid: 5801, comm: blkid Not tainted 2.6.32-5-amd64 #1 ProLiant DL380 G7
[  951.581845] RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
[  951.581934] RSP: 0018:ffff88071b9c5b80  EFLAGS: 00010006
[  951.581989] RAX: ffff880e1ad3e880 RBX: ffff880e1a4888d0 RCX: 0000000000000000
[  951.582054] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880e1a4888d0
[  951.582116] RBP: ffff880e1a4888d0 R08: ffff880719cb33e8 R09: ffff880719f12840
[  951.582175] R10: 0000000100027c26 R11: ffff88065b555500 R12: ffff880e1a4888d0
[  951.582234] R13: 0000000000000002 R14: ffff88071bcc1d60 R15: ffff88071bcc1c44
[  951.582297] FS:  00007f5c1037d740(0000) GS:ffff88001a500000(0000) knlGS:0000000000000000
[  951.582372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  951.582429] CR2: 0000000000000000 CR3: 000000071b6d2000 CR4: 00000000000006e0
[  951.582488] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  951.582546] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  951.582606] Process blkid (pid: 5801, threadinfo ffff88071b9c4000, task ffff88071a31bf90)
[  951.582680] Stack:
[  951.582729]  ffffffff8117629e ffff88071bbd7dc8 ffffffff81176c40 ffff88071bbd7dc8
[  951.582885] <0> ffff88071bbd7dc8 ffff880e1a4888d0 0000000000000096 ffff88071bcc1c00
[  951.583118] <0> ffffffff8117dec9 ffff88071bbd7dc8 ffffc9000c8da040 ffff88071a2fac10
[  951.583397] Call Trace:
[  951.583452]  [<ffffffff8117629e>] ? elv_drain_elevator+0x16/0x5a
[  951.583510]  [<ffffffff81176c40>] ? elv_insert+0x91/0x260
[  951.583568]  [<ffffffff8117dec9>] ? blk_insert_cloned_request+0x4f/0x67
[  951.583630]  [<ffffffffa022d90f>] ? dm_dispatch_request+0x33/0x59 [dm_mod]
[  951.583691]  [<ffffffffa022eedb>] ? dm_request_fn+0x121/0x1a2 [dm_mod]
[  951.583752]  [<ffffffff810b43e3>] ? sync_page_killable+0x0/0x2f
[  951.583810]  [<ffffffff8117f07a>] ? generic_unplug_device+0x21/0x34
[  951.583870]  [<ffffffffa022dac8>] ? dm_unplug_all+0x33/0x4c [dm_mod]
[  951.583928]  [<ffffffff810b43d9>] ? sync_page+0x3c/0x46
[  951.583984]  [<ffffffff810b43ec>] ? sync_page_killable+0x9/0x2f
[  951.584043]  [<ffffffff812fb80a>] ? __wait_on_bit_lock+0x3f/0x84
[  951.584101]  [<ffffffff810b42e8>] ? __lock_page_killable+0x5d/0x63
[  951.584160]  [<ffffffff81064fc0>] ? wake_bit_function+0x0/0x23
[  951.584217]  [<ffffffff810b42f7>] ? lock_page_killable+0x9/0x1f
[  951.584274]  [<ffffffff810b5917>] ? generic_file_aio_read+0x363/0x536
[  951.584334]  [<ffffffff810eed05>] ? do_sync_read+0xce/0x113
[  951.584391]  [<ffffffff81064f92>] ? autoremove_wake_function+0x0/0x2e
[  951.584451]  [<ffffffff810ccd36>] ? handle_mm_fault+0x3b8/0x80f
[  951.584508]  [<ffffffff810ef728>] ? vfs_read+0xa6/0xff
[  951.584564]  [<ffffffff810ef83d>] ? sys_read+0x45/0x6e
[  951.584621]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[  951.584677] Code:  Bad RIP value.
[  951.584795] RIP  [<(null)>] (null)
[  951.584879]  RSP <ffff88071b9c5b80>
[  951.584932] CR2: 0000000000000000
[  951.584985] ---[ end trace 71dd7f009a29d813 ]---


As I'm adding back the old paths pretty much at the same time it seems
for me that blkid wants to access ond of the devices I've just removed.
But that should not result in a NULL pointer dereference, also it
should not render the access to the LUN faulty, completely forgetting
about the kind of hardware behind it.

lun_alias (00009800064700000684a656930380000) dm-1 ,
size=4.9T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- #:#:#:# -   #:#   active faulty running
| `- #:#:#:# -   #:#   active faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- #:#:#:# -   #:#   active faulty running
  `- #:#:#:# -   #:#   active faulty running


The expected output of multipath -ll would be more like

lun_alias (00009800064700000684a656930380000) dm-1 NETAPP,LUN
size=299G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=8 status=active
| |- 1:0:2:0 sdk 8:160 active ready running
| `- 0:0:2:0 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
  |- 0:0:3:0 sdn 8:208 active ready running
  `- 1:0:3:0 sdl 8:176 active ready running


Cheers,

Bernd


-------------------------------------------------
Bernd Zeimetz
Systems Engineer

conova communications GmbH

web   |  www.conova.com
mail  |  b.zeimetz@conova.com

ZENTRALE SALZBURG
Karolingerstraße 36A
A - 5020 Salzburg

tel   |  +43/(0)662 2200-313
fax   |  +43/(0)662 2200-209
------------------------------------------------
	



Reply to: