[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Failure to activate node zero in shared memory machine



Hello:
I was running NAMD-CUDA 2.8 4JUN2011nb (a molecular dynamics
simulation code) successfully on nvidia
280.13-1. I am now bach to namd after a few months, on the same
macjhine, now nvidia 295.20-1 (which version matches debian amd64
xserver and all
libraries). First activating CUDA:

# nvidia-smi -L
# nvidia-smi -pm 1

then launching namd, node zero failure

Charmrun> charmrun started...
Charmrun> node programs all started
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 127.0.0.1, port = 41824
Charmrun> start 0 node program on localhost.
Charmrun> start 1 node program on localhost.
Charmrun> start 2 node program on localhost.
Charmrun> start 3 node program on localhost.
Charmrun> start 4 node program on localhost.
Charmrun> start 5 node program on localhost.
Charmrun> Waiting for 0-th client to connect.

Hardware

Gigabyte Technology Co., Ltd. GA-890FXA-UD5/GA-890FXA-UD5, BIOS F6 11/24/2010

 AMD Phenom(tm) II X6 1075T Processor (6 cpu cores) (version 2.20.00)

16GB RAM

Two GTX-580

Scanning NUMA topology in Northbridge 24
[    0.000000] No NUMA configuration found  (SHOULD NUMA BE ACTIVATED?
it was not when running parallel in the past)

All nvidia tests were OK:
francesco@gig64:~/1PLC$ dpkg -l | grep nvidia
ii  glx-alternative-nvidia               0.2.1
    allows the selection of NVIDIA as GLX provider
ii  libgl1-nvidia-alternatives           295.20-1
    transition libGL.so* diversions to glx-alternative-nvidia
ii  libgl1-nvidia-glx                    295.20-1
    NVIDIA binary OpenGL libraries
ii  libglx-nvidia-alternatives           295.20-1
    transition libgl.so diversions to glx-alternative-nvidia
ii  libnvidia-compiler-ia32              295.20-1
    NVIDIA runtime compiler library (32-bit)
ii  libnvidia-ml1                        295.20-1
    NVIDIA management library (NVML) runtime library
ii  nvidia-alternative                   295.20-1
    allows the selection of NVIDIA as GLX provider
ii  nvidia-compute-profiler              4.0.17-3
    NVIDIA Compute Visual Profiler
ii  nvidia-cuda-dev                      4.0.17-3
    NVIDIA CUDA development files
ii  nvidia-cuda-doc                      4.1.28-1
    NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                      4.1.28-1
    NVIDIA CUDA GDB
ii  nvidia-cuda-toolkit                  4.0.17-3
    NVIDIA CUDA toolkit
ii  nvidia-glx                           295.20-1
    NVIDIA metapackage
ii  nvidia-installer-cleanup             20111111+3
    Cleanup after driver installation with the nvidia-installer
ii  nvidia-kernel-common                 20111111+3
    NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                   295.20-1
    NVIDIA binary kernel module DKMS source
ii  nvidia-libopencl1                    295.20-1
    NVIDIA OpenCL library
ii  nvidia-libopencl1-ia32               295.20-1
    NVIDIA OpenCL 32-bit library
ii  nvidia-opencl-common                 295.20-1
    NVIDIA OpenCL driver
ii  nvidia-opencl-dev                    4.0.17-3
    NVIDIA OpenCL development files
ii  nvidia-opencl-icd-ia32               295.20-1
    NVIDIA OpenCL ICD (32-bit)
ii  nvidia-smi                           295.20-1
    NVIDIA System Management Interface
ii  nvidia-support                       20111111+3
    NVIDIA binary graphics driver support files
ii  nvidia-vdpau-driver                  295.20-1
    NVIDIA vdpau driver
ii  nvidia-xconfig                       295.20-1
    X configuration tool for non-free NVIDIA drivers
ii  xserver-xorg-video-nvidia            295.20-1
    NVIDIA binary Xorg driver
francesco@gig64:~/1PLC$


root@gig64:/home/francesco/1PLC# modinfo nvidia
filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:          char-major-195-*
version:        295.20
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        i2c-core
vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
parm:           NVreg_EnableVia4x:int
parm:           NVreg_EnableALiAGP:int
parm:           NVreg_ReqAGPRate:int
parm:           NVreg_EnableAGPSBA:int
parm:           NVreg_EnableAGPFW:int
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_RemapLimit:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UseVBios:int
parm:           NVreg_RMEdgeIntrCheck:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_NvAGP:int
root@gig64:/home/francesco/1PLC#


Thanks a lot for advice

francesco pietra


Reply to: