Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!

To: Jose Manuel dos Santos Calhariz <jose.spam@netvisao.pt>
Cc: 675969@bugs.debian.org, Jose Calhariz <jose.calhariz@tagus.ist.utl.pt>
Subject: Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
From: Jonathan Nieder <jrnieder@gmail.com>
Date: Wed, 13 Jun 2012 23:26:18 -0500
Message-id: <[🔎] 20120614042618.GB27586@burratino>
Reply-to: Jonathan Nieder <jrnieder@gmail.com>, 675969@bugs.debian.org
In-reply-to: <[🔎] 20120613201638.GB30563@calhariz.com>
References: <[🔎] 20120604160111.27574.49662.reportbug@afs04.ist.utl.pt> <[🔎] 20120605035529.GA3118@burratino> <[🔎] 20120605162717.GD19074@calhariz.com> <[🔎] 20120605172911.GB3088@burratino> <[🔎] 20120613201638.GB30563@calhariz.com>

Hi,

Jose Manuel dos Santos Calhariz wrote:

> Hi, did you had time time to look into this?

Thanks for a reminder.  Let's see.

>  - just before the BUG there is a "read error NOT corrected", "Disk
>  failure on cciss/c1d3p1, disabling device." and "Operation continuing
>  on 5 devices." 

It might be possible to simulate this by hot-unplugging a disk (for
example in a VM).

[...]
> end_request: I/O error, dev cciss/c1d3, sector 73343280
> raid5:md2: read error NOT corrected!! (sector 73343248 on cciss/c1d3p1).
> raid5: Disk failure on cciss/c1d3p1, disabling device.
> raid5: Operation continuing on 5 devices.
> raid5:md2: read error NOT corrected!! (sector 73343256 on cciss/c1d3p1).
> raid5:md2: read error NOT corrected!! (sector 73343264 on cciss/c1d3p1).
> raid5:md2: read error NOT corrected!! (sector 73343272 on cciss/c1d3p1).
> raid5:md2: read error NOT corrected!! (sector 73343280 on cciss/c1d3p1).
> raid5:md2: read error NOT corrected!! (sector 73343288 on cciss/c1d3p1).
> ------------[ cut here ]------------
> kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_none/drivers/md/raid5.c:2764!
[...]
> Code: e9 9b 01 00 00 83 7c 24 7c 02 74 04 0f 0b eb fe f6 46 28 10 c7 46 3c 00 00 00 00 0f 85 7f 01 00 00 8b 44 24 38 39 44 24 70 7d 04 <0f> 0b eb fe 83 7c 24 7c 02 75 20 6b 84 24 a8 00 00 00 78 ff 44 

		/* now write out any block on a failed drive,
		 * or P or Q if they were recomputed
		 */
		BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */

  21: 8b 44 24 38             mov    0x38(%esp),%eax
  25: 39 44 24 70             cmp    %eax,0x70(%esp)
  29: 7d 04                   jge    0x2f
  2b:*      0f 0b                   ud2         <-- trapping instruction

[...]
> EIP: 0060:[<f818c811>] EFLAGS: 00010297 CPU: 3
> EIP is at handle_stripe+0x89d/0x173e [raid456]
> EAX: 00000005 EBX: 00000002 ECX: 00000003 EDX: 00000001

s->uptodate is 0x70(%esp), so presumably disks - 1 is %eax (= 5).
The assertion tripped, meaning that s->uptodate is lower.  Stack
doesn't go far enough to let us examine s->uptodate.

[...]
> ESI: f6394000 EDI: 00000003 EBP: f6394028 ESP: f58d5e6c
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process md2_raid6 (pid: 743, ti=f58d4000 task=f6569980 task.ti=f58d4000)
> Stack:
>  e6fde3e6 c2988138 00000006 f61c8e00 00000006 0002d995 00020003 00000000
> <0> c2988138 f4cbc86c f65699ac 000f0e67 00000000 f639431c 00000005 fffffffc
> <0> f4cbc86c c1025461 00000000 00000000 00000002 00000005 00988100 c127a45c
> Call Trace:
>  [<c1025461>] ? check_preempt_wakeup+0x196/0x202
>  [<f818d9fb>] ? raid5d+0x349/0x389 [raid456]
>  [<c103b623>] ? del_timer_sync+0xa/0x14
>  [<c103b6cb>] ? process_timeout+0x0/0x5
>  [<f816206e>] ? md_thread+0xe1/0xf8 [md_mod]
>  [<c104433a>] ? autoremove_wake_function+0x0/0x2d
>  [<f8161f8d>] ? md_thread+0x0/0xf8 [md_mod]
>  [<c1044108>] ? kthread+0x61/0x66
>  [<c10440a7>] ? kthread+0x0/0x66
>  [<c1003d47>] ? kernel_thread_helper+0x7/0x10
> Code: e9 9b 01 00 00 83 7c 24 7c 02 74 04 0f 0b eb fe f6 46 28 10 c7 46 3c 00 00 00 00 0f 85 7f 01 00 00 8b 44 24 38 39 44 24 70 7d 04 <0f> 0b eb fe 83 7c 24 7c 02 75 20 6b 84 24 a8 00 00 00 78 ff 44 

I'd suggest contacting NeilBrown <neilb@suse.de> and
linux-raid@vger.kernel.org to let them know what happened and ask if
it rings a bell.  If doing so, please cc either me or this bug log so
we can track it.

Hope that helps,
Jonathan

Reply to:

Follow-Ups:
- Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
  - From: Jose Manuel dos Santos Calhariz <jose.spam@netvisao.pt>

References:
- Bug#675969: linux-image-2.6.32-5-686: kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
  - From: Jose Calhariz <jose.calhariz@tagus.ist.utl.pt>
- Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
  - From: Jonathan Nieder <jrnieder@gmail.com>
- Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
  - From: Jose Manuel dos Santos Calhariz <jose.spam@netvisao.pt>
- Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
  - From: Jonathan Nieder <jrnieder@gmail.com>
- Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
  - From: Jose Manuel dos Santos Calhariz <jose.spam@netvisao.pt>

Prev by Date: Processed: submitter 663067
Next by Date: Bug#668211: Genius iSlim 1300 V2 webcam not working since 2.6.39
Previous by thread: Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
Next by thread: Bug#675969: [squeeze] kernel BUG at ... build/source_i386_none/drivers/md/raid5.c:2764!
Index(es):
- Date
- Thread