[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#542250: repeatable crashes while copying 500G from NFS mount to local logical volume



X-Loop
owner@bugs.debian.org: Resent-Date: Wed, 19 Aug 2009 03:24:04 +0000
Resent-Message-ID: <handler.542250.B542250.12506521281831@bugs.debian.org>
Resent-Sender: owner@bugs.debian.org
X-Debian-PR-Message: followup 542250
X-Debian-PR-Package: linux-image-2.6.26-2-xen-amd64
X-Debian-PR-Keywords: 
X-Debian-PR-Source: linux-2.6
Received: via spool by 542250-submit@bugs.debian.org id=B542250.12506521281831
          (code B ref 542250); Wed, 19 Aug 2009 03:24:04 +0000
Received: (at 542250) by bugs.debian.org; 19 Aug 2009 03:22:08 +0000
X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02
	(2007-08-08) on rietz.debian.org
X-Spam-Level: 
X-Spam-Bayes: score:0.0000 Tokens: new, 40; hammy, 148; neutral, 66; spammy,
	3. spammytokens:1.000-9--H*m:136, 0.993-1--assertions, 0.992-+--H*MI:136
	hammytokens:0.000-+--H*c:protocol, 0.000-+--H*c:micalg, 0.000-+--H*c:signed,
	0.000-+--H*c:pgp-signature, 0.000-+--H*f:sk:2009081
X-Spam-Status: No, score=-6.0 required=4.0 tests=AWL,BAYES_00,FOURLA,
	FVGT_m_MULTI_ODD,HAS_BUG_NUMBER,MURPHY_DRUGS_REL8 autolearn=ham
	version=3.2.3-bugs.debian.org_2005_01_02
Received: from shadbolt.e.decadent.org.uk ([88.96.1.126])
	by rietz.debian.org with esmtp (Exim 4.63)
	(envelope-from <ben@decadent.org.uk>)
	id 1MdbkW-0000TE-92
	for 542250@bugs.debian.org; Wed, 19 Aug 2009 03:22:08 +0000
Received: from deadeye.i.decadent.org.uk ([192.168.4.185] helo=localhost)
	by shadbolt.decadent.org.uk with esmtp (Exim 4.69)
	(envelope-from <ben@decadent.org.uk>)
	id 1MdbkT-0005PD-FA
	for 542250@bugs.debian.org; Wed, 19 Aug 2009 04:22:06 +0100
Received: from womble by localhost with local (Exim 4.69)
	(envelope-from <ben@decadent.org.uk>)
	id 1MdbkS-00012B-Ij
	for 542250@bugs.debian.org; Wed, 19 Aug 2009 04:22:04 +0100
From: Ben Hutchings <ben@decadent.org.uk>
To: 542250@bugs.debian.org
In-Reply-To: <[🔎] 1250646426.16001.81.camel@localhost>
References: <[🔎] 20090818163840.6472.70854.reportbug@desktopvm.lvknet>
	 <[🔎] 1250646426.16001.81.camel@localhost>
Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-gpJGQvNhH5HmU60cBDtr"
Date: Wed, 19 Aug 2009 04:22:04 +0100
Message-Id: <1250652124.16001.136.camel@localhost>
Mime-Version: 1.0
X-Mailer: Evolution 2.26.3 
X-SA-Exim-Connect-IP: 192.168.4.185
X-SA-Exim-Mail-From: ben@decadent.org.uk
X-SA-Exim-Version: 4.2.1 (built Wed, 25 Jun 2008 17:14:11 +0000)
X-SA-Exim-Scanned: Yes (on shadbolt.decadent.org.uk)


--=-gpJGQvNhH5HmU60cBDtr
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Wed, 2009-08-19 at 02:47 +0100, Ben Hutchings wrote:
[...]
> The kernel was spinning in process context, was interrupted by the SATA
> device, and its interrupt handler also started spinning.
>=20
> The BUG line is here:
>=20
> 	/* announce we're spinning */
> 	spinning =3D &__get_cpu_var(spinning);
> 	if (spinning->lock) {
> 		BUG_ON(spinning->lock =3D=3D lock);
> 		if(raw_irqs_disabled()) {
> 			BUG_ON(__get_cpu_var(spinning_bh).lock =3D=3D lock);
> 			spinning =3D &__get_cpu_var(spinning_irq);
> 		} else {
> ->			BUG_ON(!in_softirq());
> 			spinning =3D &__get_cpu_var(spinning_bh);
> 		}
> 		BUG_ON(spinning->lock);
> 	}
> 	spinning->ticket =3D token;
> 	smp_wmb();
> 	spinning->lock =3D lock;
>=20
> This asserts that if we spin on a lock after interrupting another spin,
> and interrupts are enabled, we must be in a softirq.
>=20
> This seems bogus to me - in general, interrupts are enabled during
> interrupt handlers once their specific IRQ has been masked.
>=20
> I'll have a look at whether & how this code has changed upstream and in
> other forward-ported branches.

In the SLE 11.0 branch (downloaded from SUSE KOTD:
<ftp://ftp.suse.com/pub/projects/kernel/kotd/>) xen_spin_wait() allows
for arbitrarily nested spinlocks and has no such assertions.

The XCI tree
(<http://xenbits.xen.org/git-http/xenclient/linux-2.6.27.git> with patch
queue <http://xenbits.xen.org/git-http/xenclient/linux-2.6.27-pq.git>)
matches SLE 11.0.

This rather suggests that the assertions are wrong and we need to change
them.

Ben.

--=20
Ben Hutchings
The generation of random numbers is too important to be left to chance.
                                                            - Robert Coveyo=
u

--=-gpJGQvNhH5HmU60cBDtr
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iD8DBQBKi2/R79ZNCRIGYgcRAsOlAJ4k4t0kax1BU5DqzSctduNUr5Nv4ACfbqRy
3XgtgQCjO0TCKyuWEchVWAA=
=LaKK
-----END PGP SIGNATURE-----

--=-gpJGQvNhH5HmU60cBDtr--



Reply to: