[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#327736: lablgl: FTBFS on m68k, ocamlmktop crashes at exit



On Sun, 2005-09-11 at 21:36 +0200, Samuel Mimram wrote:
> Package: lablgl
> Severity: serious
> Tags: help
> Justification: no longer builds from source
> 
> Hi,
> 
> The last buildd failed on m68k ending on:
> 
> ocamlmktop  -I . -I +labltk -I ../../src -o lablgltop \
>   labltk.cma lablgl.cma togl.cma
> make[2]: *** [lablgltop] Segmentation fault
> make[2]: *** Deleting file `lablgltop'
> make[2]: Leaving directory `/build/buildd/lablgl-1.01/Togl/src'
> 
> Thanks to Ingo Juergensmann I've been able to get an account on an m68k
> box. I first ran gdb to know were was the problem:
> 
> arrakis:~/lablgl/lablgl-1.01/Togl/src% gdb ocamlrun
> (gdb) r /usr/bin/ocamlc -linkall toplevellib.cma -o test  -I . -I
> +labltk -I ../../src labltk.cma topstart.cmo
> [...]
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 16384 (LWP 1801)]
> 0x1d3c0000 in ?? ()
> (gdb) where
> #0  0x1d3c0000 in ?? ()
> #1  0xc04d9002 in ?? () from /usr/lib/libtk8.4.so.0
> #2  0xc057c5e8 in ?? () from /usr/lib/libtk8.4.so.0
> #3  0x0000000b in ?? ()
> #4  0x8004b7b0 in ?? ()
> #5  0xc001271c in ?? () from /lib/ld.so.1
> #6  0xeffff73e in ?? ()
> #7  0xc056ca72 in _fini () from /usr/lib/libtk8.4.so.0
> #8  0xc056ca72 in _fini () from /usr/lib/libtk8.4.so.0
> #9  0xc0009686 in _dl_rtld_di_serinfo () from /lib/ld.so.1
> #10 0xc00ffc04 in exit () from /lib/libc.so.6
> #11 0x8000fc10 in caml_sys_exit ()
> #12 0x8001653e in caml_interprete ()
> #13 0x80017872 in caml_main ()
> #14 0x80007d7e in main ()
> 
> I then recompiled tk8.4 with debugging symbols and the segmentation
> fault became a bus error:
> 
> Program received signal SIGBUS, Bus error.
> [Switching to Thread 16384 (LWP 10586)]
> 0xc04d8224 in ?? () from /usr/lib/libtk8.4.so.0
> (gdb) where
> #0  0xc04d8224 in ?? () from /usr/lib/libtk8.4.so.0
> #1  0xc04d9002 in __do_global_dtors_aux () from /usr/lib/libtk8.4.so.0
> #2  0xc056ca4a in _fini () from /usr/lib/libtk8.4.so.0
> #3  0xc0009686 in _dl_rtld_di_serinfo () from /lib/ld.so.1
> #4  0xc00ffc04 in exit () from /lib/libc.so.6
> #5  0x8000fc10 in caml_sys_exit ()
> #6  0x8001653e in caml_interprete ()
> #7  0x80017872 in caml_main ()
> #8  0x80007d7e in main ()
> 
> The symbols _dl_rtld_di_serinfo, _fini, __do_global_dtors_aux do not
> seem to be present in tk8.4 sources (grep did not find the at least),
> so I guess it must be some gcc stuff.
> 
> I'm a bit lost now and I have no idea of what to look for. If someone
> has an idea of something to try, I can use my account on the m68k.

Unfortunately I have a good idea what the problem is: in summary,
Tk is incorrectly built on Debian. This is because upstream
has no idea how to build it. This is because Ousterhout never
had any idea how to make dynamic linkage work. This is because
he never bothered to fix the bug I reported years ago.
So if this sound a bit like a rant .. well it is .. :)

 _fini is the finaliser called when
a shared library is dlclosed() the last time. 'rtld' sounds
like RTLD_NOW which is a flag to specify weak (lazy) linkage,
I guess RTLD is 'Run Time Library Dynamic' or something.. :)

__do_global_dtors_aux () is something for executing C++ destructors
for global variables in shared libraries. For C programs, it
is usually empty though ..

> I'm a bit lost now and I have no idea of what to look for. If someone
> has an idea of something to try, I can use my account on the m68k.

Yup. I think the bug is in the Tk build, it has nothing
to do with any Ocaml stuff. You should try in bash:

tclsh <return>
load tk

If that doesn't work .. and it doesn't under Ubuntu Hoary ..
then Tk isn't built correctly. Tk is 'just another' Tcl
package. If it can't be loaded just like any other tcl
package, the package isn't built correctly. 

skaller@rosella:/work/felix/flx$ ldd /usr/lib/libtk8.4.so
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000002a9575b000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x0000002a9586f000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000002a95a4e000)
        libm.so.6 => /lib/libm.so.6 (0x0000002a95b51000)
        libc.so.6 => /lib/libc.so.6 (0x0000002a95cd7000)
        /lib/ld-linux-x86-64.so.2 => /lib/ld-linux-x86-64.so.2
(0x000000552aaaa000)

you can see here for Ubuntu: Tk isn't built correctly!!
There is no dependency on Tcl. Tk cannot operate without Tcl.

now look at:

skaller@rosella:/work/felix/flx$ ldd `which wish`
        libtk8.4.so.0 => /usr/lib/libtk8.4.so.0 (0x0000002a9566c000)
        libtcl8.4.so.0 => /usr/lib/libtcl8.4.so.0 (0x0000002a9585f000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000002a95a1e000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x0000002a95b32000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000002a95d10000)
        libm.so.6 => /lib/libm.so.6 (0x0000002a95e14000)
        libc.so.6 => /lib/libc.so.6 (0x0000002a95f9a000)
        /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
(0x0000002a9555600)

and you can see the bug .. the idiots never bothered to fix it.
Wish should NOT depend on Tk. This is utterly wrong: there
is in fact no reason AT ALL to have a 'wish' program when you
have dynamic linkage, except that the developers had no idea how to 
make dynamic linkage work (wish is needed if you only have static
linkage).

And this is almost certainly the problem with Ocaml's Tk bindings.
Tk itself is linked incorrectly. 'It just works' on most systems
by luck, and because executables export Tcl symbols which Tk needs.

The *problem* is that to use the wrong linkage, Tk calls Tcl
using weak symbols which are only linked when used .. or not ..
which causes a segfault if they can't be found in the 
executable. If you try to do:

dlopen(tcl)
dlopen(tk)

surprise suprise it doesn't work unless your executable
is set up to propagate the symbols from tcl into the global
symbol table so the subsequent load of Tk can find them.
You have to use

dlopen("tcl.so",RTLD_GLOBAL)

to make it work. This is because Tk is not linked against tcl.
It should be. However, Tk MUST NOT load Tcl... :)

What's happening with the bug above? Well, Tk used to have
an event loop, but Tcl did not. That got changed over time,
so that these days Tk uses the Tcl event loop. Since
Tk is meant to be loaded by Tcl -- even though Ousterhout
could never figure out how to actually make it work --
it can also be *unloaded* -- its just another package.

The problem is that the mainline is linking to Tk directly,
and when it is unloaded the binary is creaming the event
loop -- the plug is being pulled without Tcl knowing.
The thing is, you are not ALLOWED to pull the plug on Tk.
It MUST be done by Tcl, just as Tk MUST be loaded by Tcl,
and NEVER by the application directly. 

Tk C API MUST NOT ever be called by any applications!!!

The Tk API is ONLY available to dynamically loadable
widgets loaded BY Tk. The reason, trivially, is that
there is no correct way for an application to load Tk,
except via Tcl script, because it is Tcl that MUST load
Tk, not the other way around -- which means the application
cannot see the Tk API because it isn't allowed to
dlopen() it -- well, at least not until AFTER Tcl has
already done so!!

What's happening? The destructors for Tk's static data --
Tk is a brain dead piece of code, it isn't reentrant --
are operating on data that is ALREADY unmapped.
There is a pointer in there pointing to Tcl data.
But Tcl is already gone. Because Tk was Kludged to load
Tcl to make thousands of wrong programs work, because
Ousterhout got it all wrong in the first place.

Solution: dlopen() Tcl BEFORE linking to Tk.
That way, Tcl won't be unloaded prematurely.
The best way to do that is to dlopen() Tcl first,
initialise it, then TELL Tcl to load Tk.
After that you can dlopen Tk .. when you dlclose it,
it will not be destroyed because Tcl is still using it.
Tcl will release it when IT is destroyed. It is important
that the client has already dlclosed it at this point.

Hope this rant all makes sense. Bottom line is that
Tk has been misused by everyone for decades including
its designer, who never understood how to make it load
properly -- so it isn't surprising no one else did either.
Looks like it is STILL all wrong. The open chain MUST go:

appl --> tcl --> tk

and the close chain is the reverse. If the app wants to
call Tk api, it can open tk after Tcl has. It should close
it before closing Tcl, so that Tcl can close tk, which
allows tk to unlink itself from the event loop whilst
that event loop's static data is still held by Tcl.


-- 
John Skaller <skaller at users dot sourceforge dot net>

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: