[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#800574: backport to sid/stable? (was RE: libc6: lock elision hazard on Intel Broadwell and Skylake)



On 26/10/15 20:13, Carlos Alberto Lopez Perez wrote:
> On 23/10/15 22:10, Henrique de Moraes Holschuh wrote:
>> On Fri, Oct 23, 2015, at 11:13, Carlos Alberto Lopez Perez wrote:
>>> I was having trouble (crashes with the NVIDIA proprietary driver) on a
>>> Debian system with an i7-5775C and libc6=2.19-18+deb8u1 (stable)
>>
>> This is very very likely to be braindamage on the NVIDIA driver, though.
>>
>> Are you sure that driver is not doing something as idiotic as unlocking
>> an already unlocked mutex ?
>>
>> The proper fix in that case is _always_ to fix whatever is broken,
>> because eventually it will run on something that has working hardware
>> lock elision... and crash.
>>
> 
> I can't know, since I don't have access to the source code of the
> driver, neither the debug symbols are available, so any attempt to get a
> meaningful backtrace was hopeless.
> 
> At first I also thought it was the driver doing something wrong, but
> then I found several reports of people with the same cryptic backtrace
> than me saying that this was because of the TSX-NI bug of recent Intel
> CPUs [1].
> 
> And effectively, after upgrading glibc to this one that disables TSX-NI
> for broadwell it suddenly works as expected...
> 
> So this seems to suggest that effectively TSX-NI is buggy on this CPU.
> 
> In any case... Do you know of any program or test that I can run to
> check if TSX-NI (both HLE and RTM) is working as it should or is still
> buggy on this CPU? That way we can verify better if the problem is in
> the CPU or in the driver.
> 

I'm re-reading your explanation [2] about programs crashing with SIGSEV
in __lll_unlock_elision when TSX is enabled to be caused by the program
itself trying to unlock an already unlocked lock. That would explain
everything, and will point indeed to a bug in the NVIDIA driver rather
than in the CPU.

Also, this specific model of CPU (i7-5775C) for what I have been reading
seems to have fixed TSX-NI support. At least the ark page of Intel still
advertises it [3]. In any case I'm still interested in testing this to
be 100% sure. If you know about any test program that I can run, please
let me know about it.

Cheers
------

[2] https://bugzilla.kernel.org/show_bug.cgi?id=103351#c86
[3]
http://ark.intel.com/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: