Re: Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

To: "gustavo panizzo <gfa>" <gfa@zumbi.com.ar>, 671895@bugs.debian.org
Cc: Jonathan Nieder <jrnieder@gmail.com>, debian-boot@lists.debian.org, Jurij Smakov <jurij@wooyd.org>
Subject: Re: Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)
From: Ben Hutchings <ben@decadent.org.uk>
Date: Sat, 12 May 2012 16:43:29 +0100
Message-id: <[🔎] 1336837409.8274.531.camel@deadeye>
In-reply-to: <[🔎] 20120511152501.GB659@io.zumbi.com.ar>
References: <20120508061621.GA2163@io.zumbi.com.ar> <20120509232056.GA7921@burratino> <[🔎] 20120511152501.GB659@io.zumbi.com.ar>

On Fri, 2012-05-11 at 12:25 -0300, gustavo panizzo wrote:
> adding debian-boot
> 
> 
> i've installed unstable on the box (using debootstrap) and it boots
> 3.2.0-2-sparc64 sucessfully, networking works
> 
> obp diags shows no errors
> 
> but when i boot from network using 
> http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
> 
> i get the following error
> 
>   ┌───────────────┤ Detecting link on eth0; please wait... ├────────────────┐
>   │                                                                         │
>   │                                  100%                         [  246.994391] Unable to handle kernel NULL pointer dereference
>             247.074490] tsk->{mm,active_mm}->context = 000000000000019f     │
> 14;10H[  247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000            │
> [  247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
> [  247.328648] Call Trace:                                                  │
> [  247.360793]  [000000000045dcd4] do_exit+0x94/0x708                       │
> [  247.423821]  [0000000000427550] die_if_kernel+0x2a0/0x2c8────────────────┘
> [  247.494864]  [0000000000768c84] unhandled_fault+0x8c/0x98
> [  247.565915]  [000000000076936c] do_sparc64_fault+0x6dc/0x780
> [  247.640377]  [0000000000407880] sparc64_realfault_common+0x10/0x20
> [  247.721722]  [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
[...]

This means we crashed:

> static __inline__ void gem_tx(struct net_device *dev, struct gem *gp, u32 gem_status)
> {
> 	int entry, limit;
> 
> 	entry = gp->tx_old;
> 	limit = ((gem_status & GREG_STAT_TXNR) >> GREG_STAT_TXNR_SHIFT);
> 	while (entry != limit) {
> 		struct sk_buff *skb;
> 		struct gem_txd *txd;
> 		dma_addr_t dma_addr;
> 		u32 dma_len;
> 		int frag;
> 
> 		if (netif_msg_tx_done(gp))
> 			printk(KERN_DEBUG "%s: tx done, slot %d\n",
> 				gp->dev->name, entry);
> 		skb = gp->tx_skbs[entry];
> 		if (skb_shinfo(skb)->nr_frags) {

right here, while evaluating skb_shinfo(skb).  Which probably means skb
was null.  This *could* be due to broken hardware telling us that more
packets were sent then we actually queued, but probably not since
'networking works' when not using netboot.

Is the driver successfully resetting the network controller while
net-booting?  It can time-out and will then log "SW reset is ghetto" but
will *not* abort initialisation.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

References:
- Re: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)
  - From: "gustavo panizzo <gfa>" <gfa@zumbi.com.ar>

Prev by Date: Bug#672637: Debian Wheezy installation fails on Asus Z7750
Next by Date: Wheezy release: CDs are not big enough any more...
Previous by thread: Re: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)
Next by thread: Bug#672520: syslinux-common: spins on boot, never shows the boot menu
Index(es):
- Date
- Thread