"cuda error cudastreamcreate",
Hello:
With a gaming machine
Gigabyte GA 890FXAUD5
Six-core AMD PhenomII 1075T
2x GTX 470
Debian GNU-Linux amd64 wheezy
I run successfully NAMD code (molecular dynamics simulations). Now I
am having problems getting GTX 470 to work and I can't understand
whether it is hardware or software problem, and if software the OS is
concerned. I am submitting the same problem to NAMD, s it might be
NAMD specific.
When the code works, the top of the log file says:
nfo: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
Pe 5 sharing CUDA device 1 first 1 next 1
Pe 2 sharing CUDA device 0 first 0 next 4
Did not find +devices i,j,k,... argument, using all
Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 0 sharing CUDA device 0 first 0 next 2
Pe 3 sharing CUDA device 1 first 1 next 5
Pe 1 sharing CUDA device 1 first 1 next 3
Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 4 sharing CUDA device 0 first 0 next 0
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Info: 1.64104 MB of memory in use based on CmiMemoryUsage
Info: Configuration file is min-02.conf
When failure:
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.0124412 s
Pe 5 sharing CUDA device 0 first 0 next 0
Pe 5 physical rank 5 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 5 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device
0): no CUDA-capable device is available
Did not find +devices i,j,k,... argument, using all
Pe 0 sharing CUDA device 0 first 0 next 1
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
Pe 3 sharing CUDA device 0 first 0 next 4
Pe 3 physical rank 3 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
Pe 1 sharing CUDA device 0 first 0 next 2
Pe 1 physical rank 1 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device
0): no CUDA-capable device is available
FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device
0): no CUDA-capable device is available
FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device
0): no CUDA-capable device is available
Pe 2 sharing CUDA device 0 first 0 next 3
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device
0): no CUDA-capable device is available
Pe 4 sharing CUDA device 0 first 0 next 5
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 4 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device
0): no CUDA-capable device is available
[0] Stack Traceback:
--------------------------------
In both cases:
/var/lib/dkms/nvidia/270.41.19/2.6.38-2-amd64/x86_64/module/nvidia.ko
/lib/module/2.6.38-2-amd64/update/dkms/nvidia.ko
are in order.
I tried:
nvidia-smi -r (or nvidia-smi -a)
NVIDIA: could not open the device file /dev/nvidia1 (no such file)
Failed to initialize NVML: unknown error.
unsure if these commands are for Tesla only.
Thanks for advice
francesco pietra
Reply to: