[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFS] kmc: arm64 autopkgtest time out

Control: tag -1 + confirmed
Control: tag -1 - unreproducible

Hi Paul,

Paul Gevers, on 2020-12-06 20:54:43 +0100:
> It recently started to time out on amd64 too, but not always [1]. And
> when we added armhf, that timed out too. The failures on amd64 that I
> checked were all on ci-worker13, which is one of our hosts that runs
> multiple debci-workers. That's common on our arm64 workers too.

> [1] https://ci.debian.net/packages/k/kmc/testing/amd64/

Thanks for the pointers, I believe I found a reproducer!  :)

All failing CI runners all had in common a high NR_CPUS count,
at least 32 cores.  I don't have 32 cores at hand, but kmc
provides an option -t to increase the parallelization.  The
following command, in the conditions of the autopkgtest, will
hang on any machines:

	$ kmc -ci1 -m1 -k28 -t32 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 .

strace output looks like the program just deadlocks, I see no
CPU consumption while the command is supposed to be running:

	strace: Process 3001579 attached with 23 threads
	[pid 3001633] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
	[pid 3001632] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
	[pid 3001612] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
	[pid 3001579] futex(0x55b69bf9ce18, FUTEX_WAIT_PRIVATE, 0, NULL

I suppose easy solution would be to cap the cores count of the
program while working with upstream to find a proper fix.  Will
see what can be done about it.

Have a good evening,  :)
Étienne Mollier <etienne.mollier@mailoo.org>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.

Attachment: signature.asc
Description: PGP signature

Reply to: