[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Darcs bug in alioth



Hello,

some time ago I received a mail from Sandro Tosi, with comments from
Raphaël Hertzog about some unending darcs processes running with 100%
CPU.  From that time I'm working with fixing the bug
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522617 in Alioth.  The
problem is that this bug is present using darcs push, which is a very
common command in the Debian Haskell Team workflow.  For more
information about the bug, see http://bugs.darcs.net/issue1278 .

It's not a bug in darcs, but in GHC, and it was fixed in version 6.10.2.
It's only present in 64 bits architectures.

At first I talked to buxy@#alioth@irc.oftc.net, which suggested
backporting the newer GHC and then darcs, to make it possible to install
it in alioth.  This would require to backport a lot of haskell libraries
only present in sid, and formorer@#debian-backports@irc.oftc.net
suggested me to correct it in lenny.  I talked about that with
mornfall@#darcs@irc.freenode.net , and he pointed to me that this is
probably related to these two patches in the GHC darcs repository:

Thu Nov 13 14:00:05 BRST 2008  Simon Marlow <marlowsd@gmail.com>
  * Fix another subtle shutdown deadlock
  The problem occurred when a thread tries to GC during shutdown.  In
  order to GC it has to acquire all the Capabilities in the system, but
  during shutdown, some of the Capabilities have already been closed and
  can never be acquired.

Thu Nov 13 13:57:30 BRST 2008  Simon Marlow <marlowsd@gmail.com>
  * Fix an extremely subtle deadlock bug on x86_64
  The recent_activity flag was an unsigned int, but we sometimes do a
  64-bit xchg() on it, which overwrites the next word in memory.  This
  happened to contain the sched_state flag, which is used to control the
  orderly shutdown of the system.  If the xchg() happened during
  shutdown, the scheduler would get confused and deadlock.  Don't you
  just love C?

I applied the second one to ghc6_6.8.2dsfg1-1 and the bug is not present
anymore.  As the package is small and it's a very annoying bug, I
thought that the suggestion from mornfall was a good option.  The patch
is:

   {
    hunk ./rts/Schedule.c 95
    + *
    + * NB. must be StgWord, we do xchg() on it.
    hunk ./rts/Schedule.c 98
    -nat recent_activity = ACTIVITY_YES;
    +volatile StgWord recent_activity = ACTIVITY_YES;
    hunk ./rts/Schedule.c 101
    - * LOCK: none (changes once, from false->true)
    + * LOCK: none (changes monotonically)
    hunk ./rts/Schedule.c 103
    -rtsBool sched_state = SCHED_RUNNING;
    +volatile StgWord sched_state = SCHED_RUNNING;
    hunk ./rts/Schedule.h 100
    -extern rtsBool RTS_VAR(sched_state);
    +extern volatile StgWord RTS_VAR(sched_state);
    hunk ./rts/Schedule.h 116
    -extern nat recent_activity;
    +extern volatile StgWord recent_activity;
    }

Kaol, do you think it's a good idea do incorporate this to the lenny's
GHC?  This would not require to rebuild all libraries.

Or is it a better option to use the sid ghc to rebuild a new darcs
binary and install it on alioth?  Or to upload the whole Haskell stack
to backports?

Please give me hints.

Greetings.

-- 
marcot
http://marcot.iaaeee.org/






Reply to: