[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#613221: This is related to __thread



Hi Yavor,

On 2011-03-11, at 12:25 PM, Yavor Doganov wrote:

Hi Eric,

On Mon, Feb 14, 2011 at 03:05:49PM -0700, Eric Wasylishen wrote:
It's caused by the thread-local fast_path_cache variable in pixman.c.
If you make that non-thread-local (a normal static variable) the
problem will go away.

Yep, or if you set the tls_model to *-exec.  But IMO this shouldn't be
required: "global-dynamic" appears to be the right TLS model for shared
libraries.  IMVHO, if something was seriously broken with pixman's (new)
TLS support, the whole world would be crashing, not only GNUstep.

Right. As far as I remember, I looked at the disassembled code for this variable in Ubuntu's pixman package, and it was using the "global-dynamic" model, which is the correct model to use in shared libraries. So it doesn't look like pixman is doing anything wrong.

The root problem here is interaction between thread local storage and
dlopen, because the gnustep-back bundle, which dynamically links to
libpixman, is dlopened by gnustep-gui.

Could you please explain more about this interaction (CCing
613221@bugs.debian.org if possible)?  According to pixman's upstream
maintainer, and my humble reading about the TLS documentation in GCC,
there should be no problem at all.

I think what led me to say this was, I found that modifying GNUstep-gui so it links directly to cairo and pixman made the crash disappear. So it was more or less just a guess that dlopen was somehow involved.

However, after doing a bit more research I agree there should be no problem with dlopen and TLS, assuming the shared library uses the correct TLS model, which pixman does. 

Further supporting this, I tried to write a simple test case with a layout similar to GNUstep:

1. executable, dynamically linked to:
2. shared library, which dlopens:
3. shared library, which uses TLS

and I was unable to get a crash to happen. 

Can you reproduce if you configure gnustep-back with --disable-glx?  I
can't, which leads me to the clue that the real culprit is mesa, which
uses __attribute__ ((tls_model ("initial-exec"))) for the thread-local
variables in libGL.so, and that's apparently incompatible.

Hm, that's interesting! It sounds like a convincing hypothesis.

I can test that, but it will take me a few days because I have to set up this virtual machine again.

BTW, I switched from 32-bit Ubuntu 10.10 (where I was observing the bug) to amd64 Ubuntu 10.10, and found that this bug doesn't occur on amd64. Are you also observing it on 32-bit only?

However, I'm not sure how to properly fix it other than building
pixman without TLS.

Well, we have to find where the bug really lies and fix it there.  I'm
afraid building pixman without TLS support is the wrong course of action
from wherever you look at it; I doubt that pixman's maintainers would be
keen on such move (and rightfully so).

I agree, it seems like the fact that disabling TLS in pixman makes the bug symptoms disappear is more or less a coincidence.

Hopefully we are close to tracking down this really strange bug :-)

Cheers,
Eric

Reply to: