Re: "cuda error cudastreamcreate",
On Tue, Jun 14, 2011 at 07:54:16AM +0200, Francesco Pietra wrote:
> Hello:
> With a gaming machine
> Gigabyte GA 890FXAUD5
> Six-core AMD PhenomII 1075T
> 2x GTX 470
> Debian GNU-Linux amd64 wheezy
> I run successfully NAMD code (molecular dynamics simulations). Now I
> am having problems getting GTX 470 to work and I can't understand
> whether it is hardware or software problem, and if software the OS is
> concerned. I am submitting the same problem to NAMD, s it might be
> NAMD specific.
> When the code works, the top of the log file says:
> nfo: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on
> Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
> Pe 5 sharing CUDA device 1 first 1 next 1
> Pe 2 sharing CUDA device 0 first 0 next 4
> Did not find +devices i,j,k,... argument, using all
> Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
> 470' Mem: 1279MB Rev: 2.0
> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
> 470' Mem: 1279MB Rev: 2.0
> Pe 0 sharing CUDA device 0 first 0 next 2
> Pe 3 sharing CUDA device 1 first 1 next 5
> Pe 1 sharing CUDA device 1 first 1 next 3
> Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
> 470' Mem: 1279MB Rev: 2.0
> Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
> 470' Mem: 1279MB Rev: 2.0
> Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
> 470' Mem: 1279MB Rev: 2.0
> Pe 4 sharing CUDA device 0 first 0 next 0
> Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
> 470' Mem: 1279MB Rev: 2.0
> Info: 1.64104 MB of memory in use based on CmiMemoryUsage
> Info: Configuration file is min-02.conf
> When failure:
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on
> Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.0124412 s
> Pe 5 sharing CUDA device 0 first 0 next 0
> Pe 5 physical rank 5 binding to CUDA device 0 on gig64: 'Device
> Emulation (CPU)' Mem: 0MB Rev: 9999.9999
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device 0): no
> CUDA-capable device is available
> ------------- Processor 5 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device
> 0): no CUDA-capable device is available
> Did not find +devices i,j,k,... argument, using all
> Pe 0 sharing CUDA device 0 first 0 next 1
> Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'Device
> Emulation (CPU)' Mem: 0MB Rev: 9999.9999
> Pe 3 sharing CUDA device 0 first 0 next 4
> Pe 3 physical rank 3 binding to CUDA device 0 on gig64: 'Device
> Emulation (CPU)' Mem: 0MB Rev: 9999.9999
> Pe 1 sharing CUDA device 0 first 0 next 2
> Pe 1 physical rank 1 binding to CUDA device 0 on gig64: 'Device
> Emulation (CPU)' Mem: 0MB Rev: 9999.9999
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device 0): no
> CUDA-capable device is available
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device
> 0): no CUDA-capable device is available
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device 0): no
> CUDA-capable device is available
> ------------- Processor 3 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device
> 0): no CUDA-capable device is available
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device 0): no
> CUDA-capable device is available
> ------------- Processor 1 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device
> 0): no CUDA-capable device is available
> Pe 2 sharing CUDA device 0 first 0 next 3
> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
> Emulation (CPU)' Mem: 0MB Rev: 9999.9999
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
> CUDA-capable device is available
> ------------- Processor 2 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device
> 0): no CUDA-capable device is available
> Pe 4 sharing CUDA device 0 first 0 next 5
> Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'Device
> Emulation (CPU)' Mem: 0MB Rev: 9999.9999
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device 0): no
> CUDA-capable device is available
> ------------- Processor 4 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device
> 0): no CUDA-capable device is available
Hmm, I wonder if 'no CUDA-capable device is available' means none were
found or if it means none were not already busy.
So sometimes it works and sometimes it doesn't? Is this with the same
code or is it working with some code and not with other code?
> [0] Stack Traceback:
> --------------------------------
> In both cases:
> /var/lib/dkms/nvidia/270.41.19/2.6.38-2-amd64/x86_64/module/nvidia.ko
> /lib/module/2.6.38-2-amd64/update/dkms/nvidia.ko
> are in order.
> I tried:
> nvidia-smi -r (or nvidia-smi -a)
> NVIDIA: could not open the device file /dev/nvidia1 (no such file)
> Failed to initialize NVML: unknown error.
Don't know. With one card I only have /dev/nvidia0 and /dev/nvidiactl.
I would think nvidia1 would be a second card.
> unsure if these commands are for Tesla only.
Having never done cuda or tesla things, I don't know unfortunately.
Len Sorensen
Reply to: