[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFS] kmc: arm64 autopkgtest time out



Control: reopen -1
Control: retitle -1 kmc: autopgktest times out on multimulticore hosts

Hi

On 06-12-2020 22:06, Étienne Mollier wrote:
> Paul Gevers, on 2020-12-06 20:54:43 +0100:
>> It recently started to time out on amd64 too, but not always [1]. And
>> when we added armhf, that timed out too. The failures on amd64 that I
>> checked were all on ci-worker13, which is one of our hosts that runs
>> multiple debci-workers. That's common on our arm64 workers too.
> 
>> [1] https://ci.debian.net/packages/k/kmc/testing/amd64/
> 
> Thanks for the pointers, I believe I found a reproducer!  :)
> 
> All failing CI runners all had in common a high NR_CPUS count,
> at least 32 cores.  I don't have 32 cores at hand, but kmc
> provides an option -t to increase the parallelization.  The
> following command, in the conditions of the autopkgtest, will
> hang on any machines:
> 
> 	$ kmc -ci1 -m1 -k28 -t32 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 .
> 
> strace output looks like the program just deadlocks, I see no
> CPU consumption while the command is supposed to be running:
> 
> 	strace: Process 3001579 attached with 23 threads
> 	[pid 3001633] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
> 	[pid 3001632] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
> 	[...]
> 	[pid 3001612] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
> 	[pid 3001579] futex(0x55b69bf9ce18, FUTEX_WAIT_PRIVATE, 0, NULL
> 
> I suppose easy solution would be to cap the cores count of the
> program while working with upstream to find a proper fix.  Will
> see what can be done about it.

It seems that this didn't work on our armhf worker:
https://ci.debian.net/packages/k/kmc/testing/armhf/

The amd64 run didn't happen on the big box.

Paul

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: