On 26/10/15 20:13, Carlos Alberto Lopez Perez wrote: > On 23/10/15 22:10, Henrique de Moraes Holschuh wrote: >> On Fri, Oct 23, 2015, at 11:13, Carlos Alberto Lopez Perez wrote: >>> I was having trouble (crashes with the NVIDIA proprietary driver) on a >>> Debian system with an i7-5775C and libc6=2.19-18+deb8u1 (stable) >> >> This is very very likely to be braindamage on the NVIDIA driver, though. >> >> Are you sure that driver is not doing something as idiotic as unlocking >> an already unlocked mutex ? >> >> The proper fix in that case is _always_ to fix whatever is broken, >> because eventually it will run on something that has working hardware >> lock elision... and crash. >> > > I can't know, since I don't have access to the source code of the > driver, neither the debug symbols are available, so any attempt to get a > meaningful backtrace was hopeless. > > At first I also thought it was the driver doing something wrong, but > then I found several reports of people with the same cryptic backtrace > than me saying that this was because of the TSX-NI bug of recent Intel > CPUs [1]. > > And effectively, after upgrading glibc to this one that disables TSX-NI > for broadwell it suddenly works as expected... > > So this seems to suggest that effectively TSX-NI is buggy on this CPU. > > In any case... Do you know of any program or test that I can run to > check if TSX-NI (both HLE and RTM) is working as it should or is still > buggy on this CPU? That way we can verify better if the problem is in > the CPU or in the driver. > I'm re-reading your explanation [2] about programs crashing with SIGSEV in __lll_unlock_elision when TSX is enabled to be caused by the program itself trying to unlock an already unlocked lock. That would explain everything, and will point indeed to a bug in the NVIDIA driver rather than in the CPU. Also, this specific model of CPU (i7-5775C) for what I have been reading seems to have fixed TSX-NI support. At least the ark page of Intel still advertises it [3]. In any case I'm still interested in testing this to be 100% sure. If you know about any test program that I can run, please let me know about it. Cheers ------ [2] https://bugzilla.kernel.org/show_bug.cgi?id=103351#c86 [3] http://ark.intel.com/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz
Attachment:
signature.asc
Description: OpenPGP digital signature