Control: reopen -1 Control: retitle -1 kmc: autopgktest times out on multimulticore hosts Hi On 06-12-2020 22:06, Étienne Mollier wrote: > Paul Gevers, on 2020-12-06 20:54:43 +0100: >> It recently started to time out on amd64 too, but not always [1]. And >> when we added armhf, that timed out too. The failures on amd64 that I >> checked were all on ci-worker13, which is one of our hosts that runs >> multiple debci-workers. That's common on our arm64 workers too. > >> [1] https://ci.debian.net/packages/k/kmc/testing/amd64/ > > Thanks for the pointers, I believe I found a reproducer! :) > > All failing CI runners all had in common a high NR_CPUS count, > at least 32 cores. I don't have 32 cores at hand, but kmc > provides an option -t to increase the parallelization. The > following command, in the conditions of the autopkgtest, will > hang on any machines: > > $ kmc -ci1 -m1 -k28 -t32 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 . > > strace output looks like the program just deadlocks, I see no > CPU consumption while the command is supposed to be running: > > strace: Process 3001579 attached with 23 threads > [pid 3001633] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> > [pid 3001632] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> > [...] > [pid 3001612] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> > [pid 3001579] futex(0x55b69bf9ce18, FUTEX_WAIT_PRIVATE, 0, NULL > > I suppose easy solution would be to cap the cores count of the > program while working with upstream to find a proper fix. Will > see what can be done about it. It seems that this didn't work on our armhf worker: https://ci.debian.net/packages/k/kmc/testing/armhf/ The amd64 run didn't happen on the big box. Paul
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature