[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Autopkgtest, MPI and network access (Was: Autopkgtest and MPI code)



Deat Thibault

Were these tests done under fakeroot ? See the thread from earlier
(yesterday and today) on debian-devel. Basically, openmpi breaks under
fakeroot as it gets unexpected credentials back from getsockopt()

regards

Alastair



On 04/10/2016 17:09, Thibaut Paumard wrote:
> Dear all,
>
> Last year I had trouble writing autopkgtest tests that would run
> smoothly in a container. It seems that recent changes in openmpi have
> broken it again.
>
> This is what worked last year:
>
> Le 26/05/2015 à 10:07, Johannes Ring a écrit :
>> On Sat, May 23, 2015 at 7:10 PM, Thibaut Paumard <thibaut@debian.org> wrote:
>>> What does work on my box is:
>>> orterun --mca btl_tcp_if_include lo <job>
>>>
>>> This never crashes the machine, but it does not work in a chroot (for
>>> lack of a loopback interface, I guess). I get this error message:
>> It works for me in pbuilder if I set OMPI_MCA_orte_rsh_agent=/bin/false:
>>
>> (pbuild22309)root@debian-t420s:/# orterun --mca btl_tcp_if_include lo ls
> This has been working fine until mid-September this year:
>
> https://ci.debian.net/packages/g/gyoto/unstable/amd64/
>
> I checked that I can still run gyoto with MPI parallelisation with
> openmpi 2, which is reassuring, but:
>
>   1- gyoto with MPI parallelisation fails again if I turn off network
> access (processes are unable to reach one-another). To turn off network
> access, I issue "ifdown eth0".
>
>   2- even with networking turned on, using "--mca btl_tcp_if_include lo"
> makes my gyoto job fail, whatever the value of OMPI_MCA_orte_rsh_agent
> (processes are unable to reach one-another);
>
>   3- I removed the two variables from the test script, it runs fine when
> started manually as long as network is available, but if I let
> autopkgtest run it with `null' as virtualization server, the job remain
> stuck, apparently at the step when gyoto tries to spawn workers using
> MPI_Comm_spawn.
>
> Those tests were done in VirtualBox.
>
> Looking at the last successful run of autopkgtest on ci.debian.net and
> the first failed one, the version of openmpi seems to be the same
> (libopenmpi1.10).
>
> Does anyone have a clue what might be going on here? I have the
> impression I am facing two issues, one of which is related to ompenmpi,
> the other one possibly to autopkgtest.
>
> Regards, Thibaut.
>

-- 
Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered. 


Reply to: