[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation



found 628444 linux-2.6/3.2.9-1
tags 628444 + upstream patch moreinfo
quit

Hi Dafydd,

Dafydd Harries wrote:

> I've been seeing similar problems with my "Intel Corporation Centrino
> Ultimate-N 6300".
>
> Like others, the problems seemed to start around 2.6.39.

Odd. What kernel did you use before then?  (/var/log/dpkg.log might
tell.)

> Like othes, the card flakes out a day or two after booting, and a reboot
> always fixes the problem. Occasionally it stays working for longer.
>
> Like others, I've added RAM. But as far as I can recall the upgrade
> happened well before any poblems started appearing.

Interesting and useful.

> Any ASPM settings are at their default.
>
> I'll try wd_disable=1 as a workaround for now.
>
> Meenakshi, will the patch you mentioned be applied in 3.3?

Cc-ing her.  The patch currently seems to be part of the wireless-next
tree but not davem's net tree.

> Below is a syslog excerpt from around the time of failue. It seems to
> support Meenakshi's suggestion that it's related to the queue getting
> stuck.

Well, that can be tested.  Could you try the patch against current
"master"?  It works like this:

0. Prerequisites:
	apt-get install git build-essential

1. Get the kernel history, if you don't already have it:
	git clone \
	  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. Configure and build:
	cd linux
	git checkout origin/master
	cp /boot/config-$(uname -r) .config; # current configuration
	make localmodconfig; # optional: minimize configuration
	make deb-pkg; # optionally with -j<num> for parallel build
	dpkg -i ../<name of package>; # as root
	reboot

	... test test test ...

3. Hopefully it reproduces the problem.  So try the attached patch:
	git am -3sc <the patch>
	make deb-pkg; # maybe with -j4
	dpkg -i ../<name of package>; # as root
	reboot

If it works, we can pass this to Dave with information about what
happened and your test result, to get the patch fast-tracked.

Thanks,
Jonathan

> Below is a syslog excerpt from around the time of failue. It seems to
> support Meenakshi's suggestion that it's related to the queue getting
> stuck.
[...]
> iwlwifi 0000:02:00.0: Queue 4 stuck for 2000 ms.
> iwlwifi 0000:02:00.0: Current read_ptr 112 write_ptr 115
> iwlwifi 0000:02:00.0: On demand firmware reload
> iwlwifi 0000:02:00.0: Command REPLY_QOS_PARAM failed: FW Error
> iwlwifi 0000:02:00.0: Failed to update QoS
> iwlwifi 0000:02:00.0: fw recovery, no hcmd send
> iwlwifi 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -5
> iwlwifi 0000:02:00.0: Error clearing ASSOC_MSK on BSS (-5)
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[...]
> ieee80211 phy0: Hardware restart was requested
> wpa_supplicant[1472]: CTRL-EVENT-DISCONNECTED bssid=00:50:7f:cb:4b:58 reason=4
> ieee80211 phy0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-2)
[....]
> iwlwifi 0000:02:00.0: Could not load the INST uCode section
> iwlwifi 0000:02:00.0: Failed to start RT ucode: -110
[...]
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[...]
> I get some kind of OOPS but I'm guessing this is just because the driver can't
> communicate with the card when the module is being unloaded:
[...]
> WARNING: at /build/buildd-linux-2.6_3.2.9-1-amd64-KTPapN/linux-2.6-3.2.9/debian/build/source_amd64_none/drivers/net/wireless/iwlwifi/iwl-core.c:1330 iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]()
> Hardware name: 3249CTO
> Modules linked in: uvcvideo videodev v4l2_compat_ioctl32 media snd_usb_audio snd_usbmidi_lib pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) acpi_cpufreq mperf cpufreq_stats cpufreq_userspace cpu
> Mar 12 13:15:04 localhost kernel: sync_memcpy async_tx raid1 raid0 multipath linear md_mod sd_mod crc_t10dif usbhid hid ahci libahci ehci_hcd libata scsi_mod usbcore thermal thermal_sys usb_common e1000e [last unloaded: scsi_wait_scan]
> Mar 12 13:15:04 localhost kernel: [48290.674508] Pid: 1405, comm: NetworkManager Tainted: G           O 3.2.0-2-amd64 #1
> Mar 12 13:15:04 localhost kernel: [48290.674511] Call Trace:
> Mar 12 13:15:04 localhost kernel: [48290.674520]  [<ffffffff81046879>] ? warn_slowpath_common+0x78/0x8c
> Mar 12 13:15:04 localhost kernel: [48290.674531]  [<ffffffffa03ea9af>] ? iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]
[...]
> Mar 12 13:15:04 localhost kernel: [48290.674647]  [<ffffffff812a35a5>] ? netlink_rcv_skb+0x36/0x7a
[...]
> iwlwifi 0000:02:00.0: ctx->vif =           (null), vif = ffff8801b1c72df0
> iwlwifi 0000:02:00.0:  ID = 0: ctx = ffff8801b1a834b0  ctx->vif =           (null)
From: Johannes Berg <johannes.berg@intel.com>
Date: Sun, 4 Mar 2012 08:50:46 -0800
Subject: iwlwifi: always monitor for stuck queues

commit 342bbf3fee2fa9a18147e74b2e3c4229a4564912 upstream.

If we only monitor while associated, the following
can happen:
 - we're associated, and the queue stuck check
   runs, setting the queue "touch" time to X
 - we disassociate, stopping the monitoring,
   which leaves the time set to X
 - almost 2s later, we associate, and enqueue
   a frame
 - before the frame is transmitted, we monitor
   for stuck queues, and find the time set to
   X, although it is now later than X + 2000ms,
   so we decide that the queue is stuck and
   erroneously restart the device

It happens more with P2P because there we can
go between associated/unassociated frequently.

Cc: stable@vger.kernel.org
Reported-by: Ben Cahill <ben.m.cahill@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 drivers/net/wireless/iwlwifi/iwl-core.c |   18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-core.c b/drivers/net/wireless/iwlwifi/iwl-core.c
index 7bcfa781e0b9..3abe9ede6990 100644
--- a/drivers/net/wireless/iwlwifi/iwl-core.c
+++ b/drivers/net/wireless/iwlwifi/iwl-core.c
@@ -1465,20 +1465,10 @@ void iwl_bg_watchdog(unsigned long data)
 	if (timeout == 0)
 		return;
 
-	/* monitor and check for stuck cmd queue */
-	if (iwl_check_stuck_queue(priv, priv->shrd->cmd_queue))
-		return;
-
-	/* monitor and check for other stuck queues */
-	if (iwl_is_any_associated(priv)) {
-		for (cnt = 0; cnt < hw_params(priv).max_txq_num; cnt++) {
-			/* skip as we already checked the command queue */
-			if (cnt == priv->shrd->cmd_queue)
-				continue;
-			if (iwl_check_stuck_queue(priv, cnt))
-				return;
-		}
-	}
+	/* monitor and check for stuck queues */
+	for (cnt = 0; cnt < hw_params(priv).max_txq_num; cnt++)
+		if (iwl_check_stuck_queue(priv, cnt))
+			return;
 
 	mod_timer(&priv->watchdog, jiffies +
 		  msecs_to_jiffies(IWL_WD_TICK(timeout)));
-- 
1.7.9.2


Reply to: