[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: libc recently more aggressive about pthread locks in stable ?



On Wed, Nov 9, 2016, at 06:26, Lucas Nussbaum wrote:
> On 08/11/16 at 16:01 -0200, Henrique de Moraes Holschuh wrote:
> > I fear it might be bad, but
> > I would love to be pleasantly surprised that people did get libpthreads
> > locking right most of the time...
> 
> I wonder if it has been considered to "fix" glibc so that the misuses
> that are tolerated without TSX are also tolerated with TSX? Or is that
> impossible?

AFAIK, the hardware cannot be programed to tolerate this kind of
programming error.  And I don't think that's a bad thing. Locking bugs
are already subtle enough when the whole deal is fully visible to
software and depends only on trivial atomic operations on machine word
sizes (32-bit on ia32/amd64). Hidden by hardware transactional memory,
they would go from subtle and difficult to debug straight into utterly
nasty hellbug land if the hardware was too permissive about misuse.

One can handle the SIGSEGV and attempt to recover, I suppose -- which is
painful enough to get right, and that assumes such a thing is possible
at all in the first place: we are talking about a threaded application
here -- but that is so very slow, that it is simply not worth it as far
as I am concerned.  Not that I think it would be desirable to do so in
the first place: locking bugs are best fixed, not papered over.

This is an area where KISS is absolutely required, too.  Handling that
SIGSEGV to trigger a safe whole-application exit while saving user data
is one thing, attempting to resume execution from a signal raised while
inside an transactional state that has been aborted(!) is quite another.
 This is NOT the kind of thing I would ever trust current and future
processors to always get right.  It reeks of an errata minefield one
should never enter willing.

The deal with *current* Debian stable is that, if the breakage is too
widespread, we simply might not be able to do the right thing (fix the
real bugs).  IMHO, this is not a valid excuse to paper over the breakage
for unstable (or even the next stable, as far as I am concerned.  I'd
rather delay the release, although it is _not_ clear at this time that
such a thing would be needed).   It is not really about Intel TSX, it is
about broken locking that was *already* causing hard-to-debug issues in
many cases (I believe Ian said ghostscript was already showing hard to
debug hangs in this thread), and Intel TSX happened to expose.

-- 
  Henrique de Moraes Holschuh <hmh@debian.org>


Reply to: