[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)



On Fri, 2012-05-11 at 12:25 -0300, gustavo panizzo wrote:
> adding debian-boot
> 
> 
> i've installed unstable on the box (using debootstrap) and it boots
> 3.2.0-2-sparc64 sucessfully, networking works
> 
> obp diags shows no errors
> 
> but when i boot from network using 
> http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
> 
> i get the following error
> 
>   ┌───────────────┤ Detecting link on eth0; please wait... ├────────────────┐
>   │                                                                         │
>   │                                  100%                         [  246.994391] Unable to handle kernel NULL pointer dereference
>             247.074490] tsk->{mm,active_mm}->context = 000000000000019f     │
> 14;10H[  247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000            │
> [  247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
> [  247.328648] Call Trace:                                                  │
> [  247.360793]  [000000000045dcd4] do_exit+0x94/0x708                       │
> [  247.423821]  [0000000000427550] die_if_kernel+0x2a0/0x2c8────────────────┘
> [  247.494864]  [0000000000768c84] unhandled_fault+0x8c/0x98
> [  247.565915]  [000000000076936c] do_sparc64_fault+0x6dc/0x780
> [  247.640377]  [0000000000407880] sparc64_realfault_common+0x10/0x20
> [  247.721722]  [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
[...]

This means we crashed:

> static __inline__ void gem_tx(struct net_device *dev, struct gem *gp, u32 gem_status)
> {
> 	int entry, limit;
> 
> 	entry = gp->tx_old;
> 	limit = ((gem_status & GREG_STAT_TXNR) >> GREG_STAT_TXNR_SHIFT);
> 	while (entry != limit) {
> 		struct sk_buff *skb;
> 		struct gem_txd *txd;
> 		dma_addr_t dma_addr;
> 		u32 dma_len;
> 		int frag;
> 
> 		if (netif_msg_tx_done(gp))
> 			printk(KERN_DEBUG "%s: tx done, slot %d\n",
> 				gp->dev->name, entry);
> 		skb = gp->tx_skbs[entry];
> 		if (skb_shinfo(skb)->nr_frags) {

right here, while evaluating skb_shinfo(skb).  Which probably means skb
was null.  This *could* be due to broken hardware telling us that more
packets were sent then we actually queued, but probably not since
'networking works' when not using netboot.

Is the driver successfully resetting the network controller while
net-booting?  It can time-out and will then log "SW reset is ghetto" but
will *not* abort initialisation.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: