[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#327736: lablgl: FTBFS on m68k, ocamlmktop crashes at exit



reassign 327736 tk8.4
thanks

skaller wrote:
On Sun, 2005-09-11 at 21:36 +0200, Samuel Mimram wrote:

[...]

I'm a bit lost now and I have no idea of what to look for. If someone
has an idea of something to try, I can use my account on the m68k.


Yup. I think the bug is in the Tk build, it has nothing
to do with any Ocaml stuff. You should try in bash:

tclsh <return>
load tk

Well, this does not work even on my i386 box:

% tclsh
% load tk
couldn't load file "tk": tk: cannot open shared object file: No such file or directory

So, I guess you meant "load libtk8.4.so". It gives an empty window on my i386, but on m68k:

arrakis:/# gdb tclsh
[...]
% load libtk8.4.so

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 16399)]
0x00000738 in ?? ()
(gdb) where
#0  0x00000738 in ?? ()
#1  0xc0275ec6 in Initialize (interp=0x8000bf28)
    at /root/tk/tk8.4-8.4.11/unix/../generic/tkWindow.c:2905
#2  0xc0276512 in Tk_Init (interp=0x8000bf28)
    at /root/tk/tk8.4-8.4.11/unix/../generic/tkWindow.c:2804
#3  0xc0082e5a in Tcl_LoadObjCmd () from /usr/lib/libtcl8.4.so.0
#4  0xc0041014 in TclEvalObjvInternal () from /usr/lib/libtcl8.4.so.0
#5  0xc0065b2e in TclExprFloatError () from /usr/lib/libtcl8.4.so.0
#6  0xc006ad22 in TclCompEvalObj () from /usr/lib/libtcl8.4.so.0
#7  0xc0043072 in Tcl_EvalObjEx () from /usr/lib/libtcl8.4.so.0
#8  0xc00708e6 in Tcl_RecordAndEvalObj () from /usr/lib/libtcl8.4.so.0
#9  0xc0083d02 in Tcl_Main () from /usr/lib/libtcl8.4.so.0
#10 0x80000758 in main ()

I don't understand everything here but it looks like a confirmation of what you were explaining.

If that doesn't work .. and it doesn't under Ubuntu Hoary ..
then Tk isn't built correctly. Tk is 'just another' Tcl
package. If it can't be loaded just like any other tcl
package, the package isn't built correctly.
skaller@rosella:/work/felix/flx$ ldd /usr/lib/libtk8.4.so
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000002a9575b000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x0000002a9586f000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000002a95a4e000)
        libm.so.6 => /lib/libm.so.6 (0x0000002a95b51000)
        libc.so.6 => /lib/libc.so.6 (0x0000002a95cd7000)
        /lib/ld-linux-x86-64.so.2 => /lib/ld-linux-x86-64.so.2
(0x000000552aaaa000)

you can see here for Ubuntu: Tk isn't built correctly!!
There is no dependency on Tcl. Tk cannot operate without Tcl.

Same thing for Debian :/

now look at:

skaller@rosella:/work/felix/flx$ ldd `which wish`
        libtk8.4.so.0 => /usr/lib/libtk8.4.so.0 (0x0000002a9566c000)
        libtcl8.4.so.0 => /usr/lib/libtcl8.4.so.0 (0x0000002a9585f000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000002a95a1e000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x0000002a95b32000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000002a95d10000)
        libm.so.6 => /lib/libm.so.6 (0x0000002a95e14000)
        libc.so.6 => /lib/libc.so.6 (0x0000002a95f9a000)
        /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
(0x0000002a9555600)

and you can see the bug .. the idiots never bothered to fix it.
Wish should NOT depend on Tk. This is utterly wrong: there
is in fact no reason AT ALL to have a 'wish' program when you
have dynamic linkage, except that the developers had no idea how to make dynamic linkage work (wish is needed if you only have static
linkage).

And this is almost certainly the problem with Ocaml's Tk bindings.
Tk itself is linked incorrectly. 'It just works' on most systems
by luck, and because executables export Tcl symbols which Tk needs.

The *problem* is that to use the wrong linkage, Tk calls Tcl
using weak symbols which are only linked when used .. or not ..
which causes a segfault if they can't be found in the executable. If you try to do:

dlopen(tcl)
dlopen(tk)

surprise suprise it doesn't work unless your executable
is set up to propagate the symbols from tcl into the global
symbol table so the subsequent load of Tk can find them.
You have to use

dlopen("tcl.so",RTLD_GLOBAL)

to make it work. This is because Tk is not linked against tcl.
It should be. However, Tk MUST NOT load Tcl... :)

What's happening with the bug above? Well, Tk used to have
an event loop, but Tcl did not. That got changed over time,
so that these days Tk uses the Tcl event loop. Since
Tk is meant to be loaded by Tcl -- even though Ousterhout
could never figure out how to actually make it work --
it can also be *unloaded* -- its just another package.

The problem is that the mainline is linking to Tk directly,
and when it is unloaded the binary is creaming the event
loop -- the plug is being pulled without Tcl knowing.
The thing is, you are not ALLOWED to pull the plug on Tk.
It MUST be done by Tcl, just as Tk MUST be loaded by Tcl,
and NEVER by the application directly.
Tk C API MUST NOT ever be called by any applications!!!

The Tk API is ONLY available to dynamically loadable
widgets loaded BY Tk. The reason, trivially, is that
there is no correct way for an application to load Tk,
except via Tcl script, because it is Tcl that MUST load
Tk, not the other way around -- which means the application
cannot see the Tk API because it isn't allowed to
dlopen() it -- well, at least not until AFTER Tcl has
already done so!!

What's happening? The destructors for Tk's static data --
Tk is a brain dead piece of code, it isn't reentrant --
are operating on data that is ALREADY unmapped.
There is a pointer in there pointing to Tcl data.
But Tcl is already gone. Because Tk was Kludged to load
Tcl to make thousands of wrong programs work, because
Ousterhout got it all wrong in the first place.

Solution: dlopen() Tcl BEFORE linking to Tk.
That way, Tcl won't be unloaded prematurely.
The best way to do that is to dlopen() Tcl first,
initialise it, then TELL Tcl to load Tk.
After that you can dlopen Tk .. when you dlclose it,
it will not be destroyed because Tcl is still using it.
Tcl will release it when IT is destroyed. It is important
that the client has already dlclosed it at this point.

Since you seem to be much more of an expert here than we are, do you think you could come up easily with a patch on labltk to use your solution?

Hope this rant all makes sense. Bottom line is that
Tk has been misused by everyone for decades including
its designer, who never understood how to make it load
properly -- so it isn't surprising no one else did either.
Looks like it is STILL all wrong. The open chain MUST go:

appl --> tcl --> tk

and the close chain is the reverse. If the app wants to
call Tk api, it can open tk after Tcl has. It should close
it before closing Tcl, so that Tcl can close tk, which
allows tk to unlink itself from the event loop whilst
that event loop's static data is still held by Tcl.

Thank you very much for all those detailed explanations.

Cheers,

Samuel.



Reply to: