[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Autopkgtest, MPI and network access (Was: Autopkgtest and MPI code)

Dear all,

Last year I had trouble writing autopkgtest tests that would run
smoothly in a container. It seems that recent changes in openmpi have
broken it again.

This is what worked last year:

Le 26/05/2015 à 10:07, Johannes Ring a écrit :
> On Sat, May 23, 2015 at 7:10 PM, Thibaut Paumard <thibaut@debian.org> wrote:
>> What does work on my box is:
>> orterun --mca btl_tcp_if_include lo <job>
>> This never crashes the machine, but it does not work in a chroot (for
>> lack of a loopback interface, I guess). I get this error message:
> It works for me in pbuilder if I set OMPI_MCA_orte_rsh_agent=/bin/false:
> (pbuild22309)root@debian-t420s:/# orterun --mca btl_tcp_if_include lo ls

This has been working fine until mid-September this year:


I checked that I can still run gyoto with MPI parallelisation with
openmpi 2, which is reassuring, but:

  1- gyoto with MPI parallelisation fails again if I turn off network
access (processes are unable to reach one-another). To turn off network
access, I issue "ifdown eth0".

  2- even with networking turned on, using "--mca btl_tcp_if_include lo"
makes my gyoto job fail, whatever the value of OMPI_MCA_orte_rsh_agent
(processes are unable to reach one-another);

  3- I removed the two variables from the test script, it runs fine when
started manually as long as network is available, but if I let
autopkgtest run it with `null' as virtualization server, the job remain
stuck, apparently at the step when gyoto tries to spawn workers using

Those tests were done in VirtualBox.

Looking at the last successful run of autopkgtest on ci.debian.net and
the first failed one, the version of openmpi seems to be the same

Does anyone have a clue what might be going on here? I have the
impression I am facing two issues, one of which is related to ompenmpi,
the other one possibly to autopkgtest.

Regards, Thibaut.

Reply to: