Bug#910485: Confirm issue with libpsm2-2/11.2.68-1
The change is in psm2_hal.c. It is a brand new file. Reference the
initialization loop at line 246.
/* Optimization note:
The following code attempts to initialize two different times:
First time assumes that the driver is already up, and so it attempts to
initialize with the loop control variable: wait, set to 0.
The second time, when wait is set to 1, waits for the driver to come up.
(When the parameter to: hfp_get_num_units() call below is 0,
hfp_get_num_units() does not wait for the driver to come up.
When the parameter is non-zero, the hfp_get_num_units() call below,
will wait for the driver to come up.) */
It seems like this as addressing an edge case of handling dynamic
device creation or an early psm2 process at the expense of the more
common case where the device is created long before the psm2
application executes and psm2 should fail-fast if the device isn't
psm2_ep_open() takes a timeout parameter via psm2_ep_open_opts. If
psm2_init() needs to wait for devices, then it seems like it should
also take a timeout parameter. That will need to happen upstream, I believe.
On Sat, Oct 20, 2018 at 4:28 AM Mehdi Dogguy <firstname.lastname@example.org> wrote:
> On 2018-10-19 19:53, Brian Smith wrote:
> > The problem occurs when the OFI psm2 provider invokes psm2_init() when
> > there are no hfi1 devices present on the system. The call chain
> > eventually invokes hfi1_wait_for_device() with a timeout of 0. That is
> > interpreted as 15000ms.
> Actually, that part of the code didn't change at all. I was able to
> reproduce the issue, but I am not actually sure yet from where the
> regression is coming.
Brian T. Smith
System Fabric Works
Senior Technical Staff
GPG Key: 0xB3C2C7B73BA3CD7F