Deadlock during page-in and thread_suspend
For reference, I have been running the following stress test for quite
some time now:
# stress-ng -t 2m --vm 32 --vm-bytes 750M --mmap 32 --mmap-bytes 750M
--page-in
This is run on a Qemu virtual machine with 2GB of RAM and so
consequently exercises page in and out quite heavily. I am now running
the test on 64 bit Hurd.
I have only 2 regular cases remaining where the system enters a
deadlocked type state. This one and one involving rumplib repeatedly
reporting disk timeout errors. Anyway, this one relates to process
termination again and can be summarised thus:
1) I've described the process architecture of stress-ng before but the
relevant part to this test is that stress-ng runs, forks a child which
forks another (worker) child. After the 2m timeout a signal is sent from
a parent stress-ng to the worker to trigger it to complete processing
and terminate.
2) The worker process does all the major processing which involves lots
of pageout and pagein.
3) thread0 in the worker generates a page fault. This causes page-in
involving a (top) vm_object which also has a shadow object. A mapping is
made between the top object/offset and a fictitious page to block other
threads from attempting the same page-in until thread0 has handled the
page fault. thread0 then traverses the object chain to the shadow and
makes the memory_object_data_request on the shadow object/offset and
blocks itself until the reply has arrived and been processed.
4) A signal is received by the process and is handled by (say) thread1.
As per normal signal handling, this results in thread0 being suspended
by thread1 via the system call to thread_suspend(). It can be
immediately suspended because thread0 is in TH_WAIT state and is
interruptible (TH_UNINT not set).
5) After thread1 has suspended thread0 it trips a page fault itself
which actually requires the same page that was being paged-in by
thread0. thread1 now blocks indefinitely and cannot proceed until the
original page-in completes which of course it cannot as thread0 is
suspended. thread0 will only be resumed by thread1 and thread1 cannot
continue because of the state managed by thread0.
I have some confidence that the above sequence is broadly what is
happening but it's difficult to be certain. I've got to this stage by
adding extra state to data structures rather than the otherwise huge
volume of debug logging which normally alters the timing to the point of
masking the problem anyway. In any case, I think that the scenario
described above is possible and provides a good match against the
evidence that I do have.
I have some very vague ideas for solutions but before discussing those
it would be helpful to have my analysis scrutinised for obvious error.
Cheers,
Mike.
Reply to: