Control: tag -1 + confirmed Control: tag -1 - unreproducible Hi Paul, Paul Gevers, on 2020-12-06 20:54:43 +0100: > It recently started to time out on amd64 too, but not always [1]. And > when we added armhf, that timed out too. The failures on amd64 that I > checked were all on ci-worker13, which is one of our hosts that runs > multiple debci-workers. That's common on our arm64 workers too. > [1] https://ci.debian.net/packages/k/kmc/testing/amd64/ Thanks for the pointers, I believe I found a reproducer! :) All failing CI runners all had in common a high NR_CPUS count, at least 32 cores. I don't have 32 cores at hand, but kmc provides an option -t to increase the parallelization. The following command, in the conditions of the autopkgtest, will hang on any machines: $ kmc -ci1 -m1 -k28 -t32 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 . strace output looks like the program just deadlocks, I see no CPU consumption while the command is supposed to be running: strace: Process 3001579 attached with 23 threads [pid 3001633] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 3001632] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [...] [pid 3001612] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 3001579] futex(0x55b69bf9ce18, FUTEX_WAIT_PRIVATE, 0, NULL I suppose easy solution would be to cap the cores count of the program while working with upstream to find a proper fix. Will see what can be done about it. Have a good evening, :) -- Étienne Mollier <etienne.mollier@mailoo.org> Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da Sent from /dev/pts/2, please excuse my verbosity.
Attachment:
signature.asc
Description: PGP signature