Re: State of Haskell on sparc?
On Tue, Apr 29, 2014 at 09:41:18AM -0500, Patrick Baggett wrote:
> On Tue, Apr 29, 2014 at 9:27 AM, Joachim Breitner <nomeata@debian.org>wrote:
> > one of the current Haskell release transitions blocker is the removal of
> > some obsolete Haskell libraries, in particular haskell-tls-extra
> > (https://bugs.debian.org/741230).
> >
> > According to "dak rm -R -n haskell-tls-extra" this blocked by left over
> > packages depending on it on sparc. The root of the cause is a failure of
> > haskell-tls to build on sparc:
> > https://buildd.debian.org/status/logs.php?pkg=haskell-tls&arch=sparc
>
> Hi Joachim,
>
> I'd like to look into the TLS failures. Bus errors usually mean "misaligned
> data" which aren't very difficult to fix once you see the source code. Is
> there a bug report for the SPARC failure? Help me reproduce it on my local
> machine and I think I should be able to fix it soon!
It reproduces easily by just building the package on smetana (or
presumably in a sid chroot on any other sparc system). Here's the
backtrace and a bit of extra gdb detail:
(sid_sparc-dchroot)cjwatson@smetana:~/haskell-tls-1.2.6$ gdb --args dist-ghc/build/test-tls/test-tls --plain -t \*initiate\*
[...]
(gdb) r
Starting program: /home/cjwatson/haskell-tls-1.2.6/dist-ghc/build/test-tls/test-tls --plain -t \*initiate\*
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1".
Handshakes:
Program received signal SIGBUS, Bus error.
0x003d8c2c in md5_do_chunk ()
(gdb) bt
#0 0x003d8c2c in md5_do_chunk ()
#1 0x003d9a10 in md5_update ()
#2 0x003d2070 in saqF_ret ()
#3 0x00da1e48 in StgRun ()
#4 0x00d9f65c in scheduleWaitThread ()
#5 0x00d9c774 in real_main ()
#6 0x00d9c8c8 in hs_main ()
#7 0x0003b624 in main ()
(gdb) disas /rm
Dump of assembler code for function md5_do_chunk:
0x003d8c18 <+0>: 9d e3 bf 20 save %sp, -224, %sp
0x003d8c1c <+4>: b6 07 bf c0 add %fp, -64, %i3
0x003d8c20 <+8>: 07 00 00 3f sethi %hi(0xfc00), %g3
0x003d8c24 <+12>: 82 10 20 00 clr %g1
0x003d8c28 <+16>: 86 10 e3 00 or %g3, 0x300, %g3
=> 0x003d8c2c <+20>: c4 06 40 01 ld [ %i1 + %g1 ], %g2
0x003d8c30 <+24>: b9 30 a0 18 srl %g2, 0x18, %i4
0x003d8c34 <+28>: 89 28 a0 18 sll %g2, 0x18, %g4
0x003d8c38 <+32>: ba 08 80 03 and %g2, %g3, %i5
0x003d8c3c <+36>: 88 17 00 04 or %i4, %g4, %g4
0x003d8c40 <+40>: bb 2f 60 08 sll %i5, 8, %i5
[lots more]
(gdb) info reg
g0 0x0 0
g1 0x0 0
g2 0xa9a58 694872
g3 0xff00 65280
g4 0x7f61 32609
g5 0xf7b1335a -139381926
g6 0x42435f32 1111711538
g7 0xf7ff26d0 -134273328
o0 0x271ac0ab 656064683
o1 0x2229a8ba 573155514
o2 0x6c4ea9bb 1817094587
o3 0xe526485f -450475937
o4 0x54d77ff4 1423409140
o5 0xea0eb408 -368135160
sp 0xffffd310 0xffffd310
o7 0xf0933536 -258788042
l0 0xb6316481 -1238276991
l1 0x59e194e3 1507955939
l2 0x75b9afdc 1975103452
l3 0xb55489ed -1252750867
l4 0xc5faf4cf -973409073
l5 0xeead5692 -290629998
l6 0xff94a23c -7036356
l7 0x9e8c67db -1634965541
i0 0xf7b13350 -139381936
i1 0xf7b1397e -139380354
i2 0xb4200163 -1272970909
i3 0xffffd3b0 -11344
i4 0x4e0a4a3c 1309297212
i5 0xe4a9a58 239770200
fp 0xffffd3f0 0xffffd3f0
i7 0x3d9a08 4037128
y 0x52ab2112 1386946834
psr 0xff000084 [ #2 S #24 #25 #26 #27 #28 #29 #30 #31 ]
wim *value not available*
tbr *value not available*
pc 0x3d8c2c 0x3d8c2c <md5_do_chunk+20>
npc 0x3d8c30 0x3d8c30 <md5_do_chunk+24>
fsr 0x800 [ #11 ]
csr *value not available*
And searching the web for md5_do_chunk leads me to
https://ghc.haskell.org/trac/ghc/ticket/9002, which Joachim filed.
md5_do_chunk is defined in haskell-cryptohash/cbits/md5.c (so I'm not
sure why this is filed on GHC upstream, since it's probably a bug in
that C code).
Is this enough for you to work on the rest? Since the crash is very
near the start of md5_do_chunk, hopefully it isn't too hard to
disentangle, although I suppose it's possible that ctx is somehow being
allocated at an unaligned location in Haskell code ...
--
Colin Watson [cjwatson@debian.org]
Reply to: