Re: stress-ng process termination issue

To: debian-hurd@lists.debian.org
Subject: Re: stress-ng process termination issue
From: Michael Kelly <mike@weatherwax.co.uk>
Date: Wed, 23 Jul 2025 20:13:52 +0100
Message-id: <[🔎] 1a34e2ee-637e-4740-9ceb-494019333e5b@weatherwax.co.uk>
In-reply-to: <[🔎] eb9dda26-d63f-47ba-935d-4baa070f4584@weatherwax.co.uk>
References: <[🔎] eb9dda26-d63f-47ba-935d-4baa070f4584@weatherwax.co.uk>

Some additional context for consideration. The thread 0xf60f9170 has areference count of 1 so presumably other than during repetitions of thewhile loop in $task60.0 the only reference is held by $task61.1. Thatthread is sleeping waiting for TH_EV_WAKE_ACTIVE on thread 0xf60f9170.That wakeup event presumably never arrives. Is that down to the task itis associated with being terminated?


On 22/07/2025 20:14, Michael Kelly wrote:

Hi All,
I've been experimenting with stress-ng for some time to stress test myhurd virtual machine. This has already exposed a few problems but hereis another. Sorry, for the long explanation, but it might be necessaryto make sense of the problem. The scenario under test goes somethinglike:
1) Top level supervisory process 'stress-ng' begins execution
2) It forks N times, one per stressor under test (in my case 64times). Call these processes 'stressor'.
3) The particular tests I am running are stress-vm and stress-mmap. Inthese tests each of the stressor processes forks again so that it canbe supervised and restart the test should it run out of resources.Call these processes 'worker'.
4) Each stressor sets a timeout using alarm() and then waits for theworker to terminate by calling waitpid().
5) The stressor SIGALRM handler sets a variable tested occasionallywithin the worker. If the worker tests that variable quickly then itexits normally. If it does not, then the stressor sends a series ofsignals SIGALRM (4 times), SIGTERM then finally SIGKILL with a shorttime gap between them.
The test scenario I set up uses all the vm's real memory and a certainportion of swap. Consequently when the timeout expires, many of theprocesses are paged out and they do not respond quickly which meansthat many workers receive all 6 signals. Occasionally, one of thestressor processes gets stuck within this while loop withintask_terminate ($task60.0):
while (!queue_empty(list)) {
thread = (thread_t) queue_first(list); /* thread is 0xf60f9170 andis within the worker process */
    ......
    thread_force_terminate(thread);
    ......
}
thread_force_terminate(thread) calls thread_halt(thread, TRUE) and inthis instance does very little as the the thread is already halted andit simply increases the thread suspend_count (currently standing at0x64c0fc8e !). The thread is not removed from the list and it isrepeatedly processed in the loop.
The thread 0xf60f9170 is in $task61 (the worker) and is the mainthread which does all the stress testing. Examining its state suggestsit is already halted with a state of 0x112(TH_SUSP|TH_HALTED|TH_SWAPPED).
All stack traces are attached and are annotated with extra context.
I'm trying to make sense of the thread code but as it's rather complexI thought it might save time by asking if anyone had any input tomake. In particular what do I need to look at or consider to determinewhy the state has ended this way? Better yet someone might immediatelysee the cause of the problem. I have a virtual machine snapshot ofthis moment saved so I can easily relay any additional informationrequired.
There is a 2nd thread ($task60.1) in the stressor process which isalso looping but I think that is just stuck waiting for thetask_terminate() to complete. (This 2nd thread is processing asecondary timeout setup by the stressor using alarm(1) but I don'tthink that is necessarily relevant).
None of the threads in $task61 appear to be active based on their'last updated' time reported by the kernel debugger.
Any ideas?

Reply to:

Follow-Ups:
- Re: stress-ng process termination issue
  - From: Michael Kelly <mike@weatherwax.co.uk>

References:
- stress-ng process termination issue
  - From: Michael Kelly <mike@weatherwax.co.uk>

Prev by Date: stress-ng process termination issue
Next by Date: Re: stress-ng process termination issue
Previous by thread: stress-ng process termination issue
Next by thread: Re: stress-ng process termination issue
Index(es):
- Date
- Thread