[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

GHC and GMP via FFI don't play well together (#751886)



Hi,

While trying to debug a segfault in one of Ganeti's Haskell daemons  
(#751886), I came across a memory corruption bug which I can only assume 
comes from the GHC RTS "hijacking" all of GMPs memory management to 
manage it via the SM[1].

As outlined in #751886[2], the said daemon uses FFI calls to libcurl to 
initiate TLS-encrypted communications. Currently, the haskell bindings 
are linked against the GnuTLS version of libcurl, which was recently 
updated to link against gnutls28 instead of gnutls26. gnutls28 uses 
nettle (and thus GMP) for crypto material operations, and what 
*presumably* happens is the following:

 1. A curl multi handler is constructed, with SSL key and certificate 
    loaded via gnutls/nettle. Nettle uses GMP's data types to store key 
    parameters and the memory of GMP is allocated from GHC's heap[1].

 2. Going back-and-forth between Haskell and C space, eventually a GC 
    run is triggered. The GC cannot find Haskell object references to 
    the memory allocated by GMP calls via FFI and thus marks it as free.

 3. Some other object takes over the heap chunk and on the next FFI 
    call, the SSL keys have been overwritten by random data. The result 
    is an unrecoverable SSL error ("Decrypt error"), or worse, a 
    segfault.

Now, this looks like a pretty ugly situation, primarily because GMP is a 
widely-used library and also because FFI is widely used to interface 
with a lot of external libraries.

The are many ways around or out of this situation, all of them with 
their disadvantages:

 1. Have haskell-curl depend on the OpenSSL version of libcurl. This 
    looks more like an ugly workaround and will likely have licensing 
    implications. However, it will solve #751886 for the time being.
 
 2. Patch GHC's FFI implementation to reset GMP's memory allocator 
    to/from malloc when jumping between Haskell and FFI. This is almost 
    certainly not threadsafe for a start, and I have no idea what other 
    implications it may have.

 3. Build GHC with integer-simple as INTEGER_LIBRARY, suffering an 
    unspecified performance hit for really large numbers. I tried this 
    with 7.6.3-10 from testing and the result was FTBS (unfortunately I 
    don't have the error message handy). Also upstream GHC states that 
    they do not test their builds with integer-simple, so I expect QA to 
    be an issue in this case.

There are almost certainly more options that I didn't consider. Could 
someone with better insight of GHC internals please share their views on 
this issue?

Thanks,
Apollon

[1] https://ghc.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/TheCurrentGMPImplementation
[2] https://bugs.debian.org/751886#15

Attachment: signature.asc
Description: Digital signature


Reply to: