Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)

To: Jonathan Nieder <jrnieder@gmail.com>
Cc: 659519@bugs.debian.org, Ben Hutchings <ben@decadent.org.uk>
Subject: Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)
From: David Baron <d_baron@012.net.il>
Date: Tue, 15 May 2012 16:52:55 +0300
Message-id: <[🔎] 201205151652.55486.d_baron@012.net.il>
Reply-to: David Baron <d_baron@012.net.il>, 659519@bugs.debian.org
In-reply-to: <[🔎] 20120514214408.GC7439@burratino>
References: <201204201114.55555.d_baron@012.net.il> <201205141736.35935.d_baron@012.net.il> <[🔎] 20120514214408.GC7439@burratino>

On Tuesday 15 May 2012 00:44:08 Jonathan Nieder wrote:
> David Baron wrote:
> > This is kind of large but here it is
> 
> Thanks!
> 
> [...]
> 
> > [    6.700526] [drm] Initialized nouveau 0.0.16 20090420 for 0000:01:00.0
> > on minor 0 [    7.208063] snd_ens1371 0000:04:0b.0: BAR 0: set to [io 
> > 0xdf00-0xdf3f] (PCI address [0xdf00-0xdf3f]) [    7.208826] snd_ens1371
> > 0000:04:0b.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23 [    7.366048]
> > nvidia: module license 'NVIDIA' taints kernel.
> > [    7.366800] Disabling lock debugging due to kernel taint
> > [    8.025924] NVRM: The NVIDIA probe routine was not called for 1
> > device(s). [    8.026809] NVRM: This can occur when a driver such as
> > nouveau, rivafb, [    8.026811] NVRM: nvidiafb, or rivatv was loaded and
> > obtained ownership of [    8.026813] NVRM: the NVIDIA device(s).
> > [    8.029549] NVRM: Try unloading the conflicting kernel module (and/or
> 
> Probably blacklisting the nvidia driver through /etc/modprobe.d would
> prevent these messages about the same.  But I don't think it's the
> cause of the problem.
> 
These messages happen because Nouveau grabs the interfac unless IT is 
blacklisted or nomodeset

> [...]
> 
> > [ 2656.878112] ABORTED IN=eth2 OUT=
> > MAC=00:e0:4c:68:00:c5:00:90:8f:2c:50:c9:08:00 SRC=208.83.137.114
> > DST=10.100.101.101 LEN=40 TOS=0x00 PREC=0x00 TTL=47 ID=61218 DF
> > PROTO=TCP SPT=2703 DPT=9081 SEQ=1749643406 ACK=716693663 WINDOW=46
> > RES=0x00 ACK RST URGP=0 [ 2956.105668] CPU0: Core temperature above
> > threshold, cpu clock throttled (total events = 1) [ 2956.105692] CPU1:
> > Core temperature above threshold, cpu clock throttled (total events = 1)
> > [ 2956.106875] CPU0: Core temperature/speed normal
> > [ 2956.106884] CPU1: Core temperature/speed normal
> > [ 2999.988022] [Hardware Error]: Machine check events logged
> 
> (once)
> 
> Was the machine especially active, or is there a cooling problem?

These happen a lot, seem not to be dependent on room temperature. The CPU fan 
works, I have de-dusted it, the case is open and still.

There is a "temp1 at  55 c listed on PCI adapter. It seems rock steady, too 
steady! I can get it to 59-60 on a heavy 3d hw-accelerated graphics game. 
Maybe this sensor is on the nvidia pci-express card, not the cpu!

Others, listed on ISA adapter, m/b, cpu, temp3 are all "disabled," possibly 
because of bios settings or problems.

The fan itself, speed not sensored either, has its own thermostat, evident on 
cold days.

These messages are annoying, the "throttling," if real, is a problem, but seem 
spurious.



> [...]
> 
> > [22172.918562] [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on vga
> > encoder (output 0) [22296.194815] CPU0: Core temperature above
> > threshold, cpu clock throttled (total events = 1694) [22296.194841]
> > CPU1: Core temperature above threshold, cpu clock throttled (total
> > events = 1694) [22296.195873] CPU1: Core temperature/speed normal
> > [22296.195878] CPU0: Core temperature/speed normal
> > [22349.988028] [Hardware Error]: Machine check events logged
> > [22636.814452] CPU1: Core temperature above threshold, cpu clock
> > throttled (total events = 5487) [22636.814478] CPU0: Core temperature
> > above threshold, cpu clock throttled (total events = 5486)
> > [22636.815515] CPU1: Core temperature/speed normal
> > [22636.815521] CPU0: Core temperature/speed normal
> > [22799.988028] [Hardware Error]: Machine check events logged
> > [23002.995857] CPU1: Core temperature above threshold, cpu clock
> > throttled (total events = 8649) [23002.995886] CPU0: Core temperature
> > above threshold, cpu clock throttled (total events = 8648)
> > [23002.997076] CPU1: Core temperature/speed normal
> > [23002.997083] CPU0: Core temperature/speed normal
> > [23100.000210] [Hardware Error]: Machine check events logged
> > [23504.371700] CPU1: Core temperature above threshold, cpu clock
> > throttled (total events = 9153) [23504.371723] CPU0: Core temperature
> > above threshold, cpu clock throttled (total events = 9152)
> > [23504.372762] CPU1: Core temperature/speed normal
> > [23504.372768] CPU0: Core temperature/speed normal
> > [23549.988022] [Hardware Error]: Machine check events logged
> 
> [...]
> 
> This time it keeps happening, until
> 
> > [24835.817176] CPU0: Core temperature above threshold, cpu clock
> > throttled (total events = 9596) [24835.817200] CPU1: Core temperature
> > above threshold, cpu clock throttled (total events = 9597)
> > [24835.818353] CPU1: Core temperature/speed normal
> > [24835.818356] CPU0: Core temperature/speed normal
> 
> [...]
> 
> > [25049.988081] [Hardware Error]: Machine check events logged
> 
> [...]
> 
> > [25280.229943] usb 1-3: USB disconnect, device number 6
> > [25289.864027] usb 1-3: new high-speed USB device number 7 using ehci_hcd
> > [25289.998233] usb 1-3: New USB device found, idVendor=1004,
> > idProduct=618e [25289.998239] usb 1-3: New USB device strings: Mfr=1,
> > Product=2, SerialNumber=3 [25289.998243] usb 1-3: Product: LG Android
> > USB Device
> > [25289.998246] usb 1-3: Manufacturer: LG Electronics Inc.
> 
> Perhaps the machine was in a hot place at the same time as the phone
> was plugged in.  In any case, it would be useful to rule out cooling
> problems, the nvidia driver, and the virtualbox driver as causes.
> 
> If 3.1.y (e.g., from snapshot.debian.org) still works fine, the above
> look like red herrings, so still no idea what's actually wrong.  Would
> you be able to bisect to find the change that introduced this
> regression (I can list what commands do so)?

Since problem occurs with or without nvidia, that is not the problem.
If non-sound drivers are causing this problem, I would guess virtual box 
because it needs alsa for sound in its VMs. Further testing needed.

Reply to:

Follow-Ups:
- Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)
  - From: Jonathan Nieder <jrnieder@gmail.com>
- Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)
  - From: Ben Hutchings <ben@decadent.org.uk>

References:
- Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)
  - From: Jonathan Nieder <jrnieder@gmail.com>

Prev by Date: Bug#672431: 3.2.0-3.2.16ubuntu
Next by Date: RE: Basic question on debian kernel versions
Previous by thread: Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)
Next by thread: Bug#659519: [3.1 -> 3.2.4 regression] Sound problems (Re: Other card works the same way!)
Index(es):
- Date
- Thread