Re: GHC 6.12.1 in experimental & utf8-string

To: debian-haskell@lists.debian.org
Subject: Re: GHC 6.12.1 in experimental & utf8-string
From: John MacFarlane <jgm@berkeley.edu>
Date: Wed, 13 Jan 2010 08:27:28 -0800
Message-id: <[🔎] 20100113162728.GC21796@protagoras.phil.berkeley.edu>
In-reply-to: <[🔎] 20100113140140.GA32348@piperka.net>
References: <[🔎] 20100112172747.GA29467@piperka.net> <[🔎] 20100113140140.GA32348@piperka.net>

I'm glad GHC 6.12 is in experimental.

I wanted to point out a potential problem compiling packages that use
utf8-string's System.IO.UTF8 module against GHC 6.12. Unlike previous
versions, GHC 6.12's IO functions automatically handle character
encodings (using iconv). This means that if you compile a package using
System.IO.UTF8 against GHC 6.12, you'll end up doing double encoding,
which leads to garbage.

There are two ways of fixing the problem.  The first is to fix
each package's source by doing this kind of thing:

    -- Note: ghc >= 6.12 (base >=4.2) supports unicode through iconv
    -- So we use System.IO.UTF8 only if we have an earlier version
    #if MIN_VERSION_base(4,2,0)
    import System.IO ( hPutStr, hPutStrLn )
    #else
    import Prelude hiding ( putStr, putStrLn, writeFile, readFile, getContents )
    import System.IO.UTF8
    #endif

This is probably the best solution, because in many cases it will remove
a dependency on utf8-string, but it requires fixing the upstream
source for each affected package. (I've already done this in pandoc 1.4.)

Another solution would be to modify utf8-string.  I have written to
the package's maintainer with two suggestions about how this might
be done (but no reply yet):

1.  Use the CPP trick described above; if base >= 4.2, just re-export
the corresponding System.IO functions in System.IO.UTF8

2.  (probably better) Change System.IO.UTF8 to use ByteString IO functions.
For example, instead of the current

    putStr :: String -> IO ()
    putStr x = IO.putStr (encodeString x)

which will behave differently depending on whether GHC 6.12's
version of IO.putStr or an earlier version is used, use:

    putStr :: String -> IO ()
    putStr x = B.putStr (fromString x)

where B is Data.ByteString.Lazy.  This function would work with GHC
6.12 and earlier versions.

Best,
John

Reply to:

References:
- GHC 6.12.1 in experimental
  - From: Kari Pahula <kaol@debian.org>
- Re: GHC 6.12.1 in experimental
  - From: Kari Pahula <kaol@debian.org>

Prev by Date: Re: GHC 6.12.1 in experimental
Next by Date: Hash-based depdendcies – please review
Previous by thread: Re: GHC 6.12.1 in experimental
Next by thread: Hash-based depdendcies – please review
Index(es):
- Date
- Thread