[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#464886: jackd: wrongly killed by watchdog



Package: jackd
Version: 0.101.1-2
Severity: normal
Tags: patch

There seems to be a problem with JACK's watchdog. On our systems,
JACK is sometimes killed by the watchdog when it shouldn't be,
typically in situations when some client sets a lot of connections.

The watchdog is triggered by jack_engine_process(), and the watchdog
thread kills JACK when it isn't triggered within 5 seconds on the
assumption that JACK hangs.

However, jack_engine_process() not being called doesn't necessarily
mean that JACK hangs. One alternative reason, freewheeling, is dealt
with in jack_watchdog_thread(), though not quite correctly AFAICS.
If freewheeling stops just before the sleep (5) ends, there is no
time left to trigger the watchdog, so JACK will be killed wrongly.

Another reason, which seems to be the one I'm seeing, is that
jack_engine_process() doesn't run while other clients have the graph
lock. Sure, one invocation of, say, jack_connect() shouldn't hold
the lock for long, but many invocations combined might, under bad
circumstances, make the watchdog time out.

How to reproduce:

Reproducing is a bit hard, since it fails only intermittently. With
the attached test program (which is admittedly quite gross, starting
26 JACK clients which all permanently connect and disconnect ports)
on a test machine JACK is killed by its watchdog (with the default
timeout of 5 seconds) within a minute after starting about half the
time.

I start "jackd -R -dalsa -p4096 -n2" and then the attached program
(jack-watchdog-bug.c).

Fix:

The attached patch triggers the watchdog when acquiring the graph
lock. So, assuming that nothing keeps the lock for too long, the
watchdog should be happy.

When trying the patched JACK with the above test program some
20 times, I didn't see the watchdog killing JACK, even with the
timeout reduced to 1 second.

I also trigger the watchdog when (before!) exiting freewheeling,
though I don't think that's the problem I've seen, so this part is
probably untested.
#include <unistd.h>
#include <jack/jack.h>

int main ()
{
  char name[2] = { 'a', 0 };
  int i;
  for (i = 0; i < 25 && fork (); i++) ;
  *name += i;
  jack_client_t *jc = jack_client_new (name);
  if (!jc || jack_activate (jc))
    return 1;
  const char **p = jack_get_ports (jc, "alsa_pcm:capture_1", NULL, JackPortIsOutput);
  const char *t = jack_port_name (jack_port_register (jc, "test", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0));
  while (1)
    {
      jack_connect (jc, p[0], t);
      jack_disconnect (jc, p[0], t);
    }
}

Attachment: jack-watchdog.patch
Description: Binary data


Reply to: