[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: Fwd: "cuda error cudastreamcreate",



I forgot the list.
f.


---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Thu, Jun 16, 2011 at 4:11 PM
Subject: Re: Fwd: "cuda error cudastreamcreate",
To: Brian Morris <cymraegish@gmail.com>


Oh, no, absolutely no. Where are scientific apencl applications? And
not only for that.
f.

On Thu, Jun 16, 2011 at 3:59 AM, Brian Morris <cymraegish@gmail.com> wrote:
> Why are you using Cuda rather than OpenCL ? Nvidia has said they are cutting
> back on their GPU business and moving into CPUs for tablets which are now
> appearing on the market. If you have to move to AMD/ATI in the future OpenCL
> will still work, but CUDA will not.
>
>
>
> On Wed, Jun 15, 2011 at 8:22 AM, Francesco Pietra <chiendarret@gmail.com>
> wrote:
>>
>> Running "nvidia-smi -L" as root restores the visibility of the graphic
>> cards. At any boot such visibility vanishes. So, it is a small
>> problem, or no problem. francesco
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret@gmail.com>
>> Date: Wed, Jun 15, 2011 at 4:37 PM
>> Subject: Fwd: Fwd: "cuda error cudastreamcreate",
>> To: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>, amd64 Debian
>> <debian-amd64@lists.debian.org>
>>
>>
>> The simulation (pressure equilibration) was completed successfully.
>> Next run (just a continuation of previous pressure equilibration)
>> failed, again 'Device Emulation (CPU' , see log file below. Attempted
>> again, same error.
>>
>> # modinfo nvidia
>> filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
>> alias:          char-major-195-*
>> supported:      external
>> license:        NVIDIA
>> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> depends:        i2c-core
>> vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
>> parm:           NVreg_EnableVia4x:int
>> parm:           NVreg_EnableALiAGP:int
>> parm:           NVreg_ReqAGPRate:int
>> parm:           NVreg_EnableAGPSBA:int
>> parm:           NVreg_EnableAGPFW:int
>> parm:           NVreg_Mobile:int
>> parm:           NVreg_ResmanDebugLevel:int
>> parm:           NVreg_RmLogonRC:int
>> parm:           NVreg_ModifyDeviceFiles:int
>> parm:           NVreg_DeviceFileUID:int
>> parm:           NVreg_DeviceFileGID:int
>> parm:           NVreg_DeviceFileMode:int
>> parm:           NVreg_RemapLimit:int
>> parm:           NVreg_UpdateMemoryTypes:int
>> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> parm:           NVreg_UseVBios:int
>> parm:           NVreg_RMEdgeIntrCheck:int
>> parm:           NVreg_UsePageAttributeTable:int
>> parm:           NVreg_EnableMSI:int
>> parm:           NVreg_MapRegistersEarly:int
>> parm:           NVreg_RegisterForACPIEvents:int
>> parm:           NVreg_RegistryDwords:charp
>> parm:           NVreg_RmMsg:charp
>> parm:           NVreg_NvAGP:int
>>
>> However:
>>
>> $ nvidia-smi -L
>> Could not open device /dev/nvidia1 (no such file)
>> Failed to initialize NVML: unknown error.
>>
>>
>> I am unable to draw technical conclusions from this 'unknown error'. I
>> wonder whether other information can be extracted to fix the problems.
>>
>> Thanks for advice.
>>
>> francesco
>>
>>
>>
>>
>> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
>> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
>> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
>> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.00658393 s
>> Pe 2 sharing CUDA device 0 first 0 next 3
>> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
>> Emulation (CPU)'  Mem: 0MB  Rev: 9999.9999
>> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
>> CUDA-capable device is available
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret@gmail.com>
>> Date: Wed, Jun 15, 2011 at 9:04 AM
>> Subject: Re: Fwd: "cuda error cudastreamcreate",
>> To: Fabricio Cannini <fabricio@versatushpc.com.br>, Lennart Sorensen
>> <lsorense@csclub.uwaterloo.ca>, amd64 Debian
>> <debian-amd64@lists.debian.org>
>>
>>
>> The "nvidia-smi -L"  output was for a machine of Jim Phillips, the
>> main developer of NAMD. He provided that to show that it should also
>> work with my GTX 470 cards.
>>
>> That said, my problems seem to have been solved by following Lennart's
>> indications. The driver was rebuilt, date 15 June, and NAMD simulation
>> could be started regularly. However, we have to wait before claiming
>> full victory. Please see below..
>>
>> In retrospect, the nvidia.ko I had before, dated 5 June, must have
>> also been built within Debian. Renaming it no_nvidia.ko prevented
>> rebuilding for the reasons that Lennart clarified.
>>
>> For some reasons, previous installation of nvidia.ko must have had
>> some problems, as, for example, "nvidia-smi -L" did not work (there
>> was a single installation of nvidia-smi, "nvidia-smi 270.41.19-1"),
>> while "modinfo nvidia" output was correct. Now, both are correct:
>>
>> $ nvidia-smi -L
>> GPU 0: GeForce GTX 470 (UUID: N/A)
>> GPU 1: GeForce GTX 470 (UUID: N/A)
>>
>> # modinfo nvidia
>> filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
>> alias:          char-major-195-*
>> supported:      external
>> license:        NVIDIA
>> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> depends:        i2c-core
>> vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
>> parm:           NVreg_EnableVia4x:int
>> parm:           NVreg_EnableALiAGP:int
>> parm:           NVreg_ReqAGPRate:int
>> parm:           NVreg_EnableAGPSBA:int
>> parm:           NVreg_EnableAGPFW:int
>> parm:           NVreg_Mobile:int
>> parm:           NVreg_ResmanDebugLevel:int
>> parm:           NVreg_RmLogonRC:int
>> parm:           NVreg_ModifyDeviceFiles:int
>> parm:           NVreg_DeviceFileUID:int
>> parm:           NVreg_DeviceFileGID:int
>> parm:           NVreg_DeviceFileMode:int
>> parm:           NVreg_RemapLimit:int
>> parm:           NVreg_UpdateMemoryTypes:int
>> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> parm:           NVreg_UseVBios:int
>> parm:           NVreg_RMEdgeIntrCheck:int
>> parm:           NVreg_UsePageAttributeTable:int
>> parm:           NVreg_EnableMSI:int
>> parm:           NVreg_MapRegistersEarly:int
>> parm:           NVreg_RegisterForACPIEvents:int
>> parm:           NVreg_RegistryDwords:charp
>> parm:           NVreg_RmMsg:charp
>> parm:           NVreg_NvAGP:int
>>
>>
>> I said above that time will show if the system is stable. In fact,
>> this morning, NAMD simulation did not start (I am using the console
>> memory to recover commands, so that no error of digitizing). I had not
>> carried out any amd64 upgrade in between. From the simulation log:
>>
>>
>> Info: Charm++/Converse parallel runtime startup completed at 0.00989103 s
>> Pe 2 sharing CUDA device 0 first 0 next 3
>> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
>> Emulation (CPU)'  Mem: 0MB  Rev: 9999.9999
>> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
>> CUDA-capable device is available
>>
>> 'Device Emulation (CPU)' indicates (for some to me unclear reasons)
>> that things have gone bad.
>>
>> On a second identical attempt (after having explored the driver
>> location and carried out info commands), NAMD simulation started, with
>> the correct log output:
>>
>> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
>> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
>> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
>> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
>>
>>
>> We will see if failure/success will be presented again (now a
>> simulation lasts several hours (which would be days on a 8 processor
>> machine). If failure will occur again, there are so many possible
>> reasons, including problems with the NAMD code.
>>
>> I was so discomforted yesterday to allude to a change of driver
>> source. Which was unfair.
>>
>> Thanks a lot
>> francesco
>>
>> On Wed, Jun 15, 2011 at 2:22 AM, Fabricio Cannini
>> <fabricio@versatushpc.com.br> wrote:
>> > Em terça-feira 14 junho 2011, às 16:01:57, Lennart Sorensen escreveu:
>> >> On Tue, Jun 14, 2011 at 07:23:38PM +0200, Francesco Pietra wrote:
>> >> > I forgot to answer: yes, sometime it works, sometimes not, everything
>> >> > being the same.
>> >> >
>> >> > As a matter of fact, after a day of failure, I have now renamed back
>> >> >
>> >> > /lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko
>> >> >
>> >> > to
>> >> >
>> >> > /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko
>> >> >
>> >> > and the NAMD simulation started regularly using both gtx 470. The
>> >> > machine had not been touched either.
>> >>
>> >> I wonder if having the 9800 card in there along with the 470 gtx cards
>> >> is confusing the driver.  Maybe the card order is getting swapped
>> >> around
>> >> on some boots.
>> >>
>> >> What is the 9800 doing in the box anyhow?
>> >
>> > Hi All.
>> >
>> > I'm thinking the same as Lennart. It seems to me that the order which
>> > the
>> > cards are named varies, thus confusing the application( s ). I'd try to
>> > fix the
>> > order in /etc/X11/xorg.conf and see if it works. Look in the cuda docs
>> > how to
>> > do that.
>> >
>> > Good luck.
>> >
>> >
>> > --
>> > To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
>> > with a subject of "unsubscribe". Trouble? Contact
>> > listmaster@lists.debian.org
>> > Archive: 201106142122.04376.fcannini@gmail.com">http://lists.debian.org/201106142122.04376.fcannini@gmail.com
>> >
>> >
>>
>>
>> --
>> To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
>> with a subject of "unsubscribe". Trouble? Contact
>> listmaster@lists.debian.org
>> Archive:
>> BANLkTimUuPNrKwcjy_2SyMwLDS4A1nCbXA@mail.gmail.com">http://lists.debian.org/BANLkTimUuPNrKwcjy_2SyMwLDS4A1nCbXA@mail.gmail.com
>>
>
>


Reply to: