[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SLURM jobs and hwloc errors



Hello all.

I hope this is the right list. If not, please direct me to the right one.

I installed a SLURM cluster (using the default Debian packages, that are kept at the latest stable release) but after the last upgrade (from 10 to 11) some users started seeing messages like this:
-8<--
slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5
slurmstepd-str957-mtx-01: error: task[0] unable to set taskset '0x0'
-8<--
It appears quite randomly: the same job, if resubmitted to the same node, often works and the message does not reappear!

Sometimes another message (that I suppose is unrelated, but maybe not) gets logged:
-8<--
Open MPI's OFI driver detected multiple equidistant NICs from the current process, but had insufficient information to ensure MPI processes fairly pick a NIC for use. This may negatively impact performance. A more modern PMIx server is necessary to
resolve this issue.
-8<--

Could someone more experienced please help me diagnose (or even fix) these issues?

Tks.

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786


Reply to: