[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: Failure to activate node zero in shared memory machine



I have finally discovered that the problem (partly solved) was with
NAMD not the OS.

francesco pietra


---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Sat, Mar 10, 2012 at 7:33 AM
Subject: Fwd: Failure to activate node zero in shared memory machine
To: amd64 Debian <debian-amd64@lists.debian.org>


I forgot to add that I tried with either AMBER ff and CHARMM ff
(all27). In both cases also with previously proven systems and
scripts.

Also, I am using the precompiled NAMD (self-contained
parallelization), not message passing from Debian.

francesco


---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Fri, Mar 9, 2012 at 7:14 PM
Subject: Failure to activate node zero in shared memory machine
To: amd64 Debian <debian-amd64@lists.debian.org>


Hello:
I was running NAMD-CUDA 2.8 4JUN2011nb (a molecular dynamics
simulation code) successfully on nvidia
280.13-1. I am now bach to namd after a few months, on the same
macjhine, now nvidia 295.20-1 (which version matches debian amd64
xserver and all
libraries). First activating CUDA:

# nvidia-smi -L
# nvidia-smi -pm 1

then launching namd, node zero failure

Charmrun> charmrun started...
Charmrun> node programs all started
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 127.0.0.1, port = 41824
Charmrun> start 0 node program on localhost.
Charmrun> start 1 node program on localhost.
Charmrun> start 2 node program on localhost.
Charmrun> start 3 node program on localhost.
Charmrun> start 4 node program on localhost.
Charmrun> start 5 node program on localhost.
Charmrun> Waiting for 0-th client to connect.

Hardware

Gigabyte Technology Co., Ltd. GA-890FXA-UD5/GA-890FXA-UD5, BIOS F6 11/24/2010

 AMD Phenom(tm) II X6 1075T Processor (6 cpu cores) (version 2.20.00)

16GB RAM

Two GTX-580

Scanning NUMA topology in Northbridge 24
[    0.000000] No NUMA configuration found  (SHOULD NUMA BE ACTIVATED?
it was not when running parallel in the past)

All nvidia tests were OK:
francesco@gig64:~/1PLC$ dpkg -l | grep nvidia
ii  glx-alternative-nvidia               0.2.1
   allows the selection of NVIDIA as GLX provider
ii  libgl1-nvidia-alternatives           295.20-1
   transition libGL.so* diversions to glx-alternative-nvidia
ii  libgl1-nvidia-glx                    295.20-1
   NVIDIA binary OpenGL libraries
ii  libglx-nvidia-alternatives           295.20-1
   transition libgl.so diversions to glx-alternative-nvidia
ii  libnvidia-compiler-ia32              295.20-1
   NVIDIA runtime compiler library (32-bit)
ii  libnvidia-ml1                        295.20-1
   NVIDIA management library (NVML) runtime library
ii  nvidia-alternative                   295.20-1
   allows the selection of NVIDIA as GLX provider
ii  nvidia-compute-profiler              4.0.17-3
   NVIDIA Compute Visual Profiler
ii  nvidia-cuda-dev                      4.0.17-3
   NVIDIA CUDA development files
ii  nvidia-cuda-doc                      4.1.28-1
   NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                      4.1.28-1
   NVIDIA CUDA GDB
ii  nvidia-cuda-toolkit                  4.0.17-3
   NVIDIA CUDA toolkit
ii  nvidia-glx                           295.20-1
   NVIDIA metapackage
ii  nvidia-installer-cleanup             20111111+3
   Cleanup after driver installation with the nvidia-installer
ii  nvidia-kernel-common                 20111111+3
   NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                   295.20-1
   NVIDIA binary kernel module DKMS source
ii  nvidia-libopencl1                    295.20-1
   NVIDIA OpenCL library
ii  nvidia-libopencl1-ia32               295.20-1
   NVIDIA OpenCL 32-bit library
ii  nvidia-opencl-common                 295.20-1
   NVIDIA OpenCL driver
ii  nvidia-opencl-dev                    4.0.17-3
   NVIDIA OpenCL development files
ii  nvidia-opencl-icd-ia32               295.20-1
   NVIDIA OpenCL ICD (32-bit)
ii  nvidia-smi                           295.20-1
   NVIDIA System Management Interface
ii  nvidia-support                       20111111+3
   NVIDIA binary graphics driver support files
ii  nvidia-vdpau-driver                  295.20-1
   NVIDIA vdpau driver
ii  nvidia-xconfig                       295.20-1
   X configuration tool for non-free NVIDIA drivers
ii  xserver-xorg-video-nvidia            295.20-1
   NVIDIA binary Xorg driver
francesco@gig64:~/1PLC$


root@gig64:/home/francesco/1PLC# modinfo nvidia
filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:          char-major-195-*
version:        295.20
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        i2c-core
vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
parm:           NVreg_EnableVia4x:int
parm:           NVreg_EnableALiAGP:int
parm:           NVreg_ReqAGPRate:int
parm:           NVreg_EnableAGPSBA:int
parm:           NVreg_EnableAGPFW:int
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_RemapLimit:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UseVBios:int
parm:           NVreg_RMEdgeIntrCheck:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_NvAGP:int
root@gig64:/home/francesco/1PLC#


Thanks a lot for advice

francesco pietra


Reply to: