[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: upgrade to jessie from wheezy with cuda problems



I think it was renamed.  No idea why.  modinfo nvidia-current should
work though.
Yes, it does.

 Do you have the cuda libraries for the 319 version installed?
Yes


I don't play around with GPU computations, but from what I have read it
does need a certain size job before the overhead of transfering the
data and managing the GPU makse it worthwhile, but for large jobs the
high core count and memory bandwidth makes a big difference.

500,000 atoms, as in my test, is a large system for unbiased molecular dynamics. At any event, I looked at the the nvidia-cuda-toolkit version 5.0. nvidia for GPU Computing SDK, to build examples that should include a bandwidth test, offers linux packages for Fedora RHEL Ubuntu OpenSUSE and SUSE. No Debian. I had unpleasant experiences with Ubuntu packages, and it is well known that Ubuntu, unlike LinuxMint, is not compatible with Debian. Therefore, I did not try the cuda toolkit. I wonder why Ubuntu has so widely replaced Debian among the mass. Sad, and somewhat irritating, for me.

I tried
francesco@gig64:~/tmp$ ls
CUDA-Z-0.7.189.run
francesco@gig64:~/tmp$ ./CUDA-Z-0.7.189.run
CUDA-Z 0.7.189 Container
Starting CUDA-Z...
/home/francesco/tmp/CUDA-Z-657a-580e-a8aa-0faa/cuda-z: error while loading shared libraries: libXrender.so.1: cannot open shared object file: No such file or directory
francesco@gig64:~/tmp$ ls
CUDA-Z-0.7.189.run  libXrender.so.1
francesco@gig64:~/tmp$ ./CUDA-Z-0.7.189.run
CUDA-Z 0.7.189 Container
Starting CUDA-Z...
/home/francesco/tmp/CUDA-Z-a3db-49bf-8cb7-059d/cuda-z: error while loading shared libraries: libXrender.so.1: cannot open shared object file: No such file or directory
francesco@gig64:~/tmp$

Actually the required lib is available, as shown by my copy into tmp. I don't remember the source of this GNU CUDA-Z tool. Any experience with?

I have also met reports of unexciting experience with PCIe 3.0, that is meager or no gain over PCIe 2.0, however it deals of people carrying out games, which is different from NAMD molecular dynamics, where most is done by the GPUs but AT EACH STEP energy has to be calculated by the CPU.

thanks
francesco pietra



On Tue, Nov 12, 2013 at 11:37 PM, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote:
On Tue, Nov 12, 2013 at 10:35:53PM +0100, Francesco Pietra wrote:
> # apt-get --purge remove *legacy*
> did the job.
>
> I wonder how these legacy packages entered the scene while
> updating/upgrading from a clean wheezy.
>
> The bad news are that with the new driver 319.60 there was no acceleration
> of molecular dynamics for a job of modest size (150K atoms) and slight
> acceleration (0.12 s/step vs 0.14 s/step) for a heavy job (500K atoms).
> Weather bringing from PCIe 2.0 (with the 304.xx driver of wheezy) to PCIe
> 3.0 (with driver 319.60 of jessie)  (increasing the bandwidth from GPUs to
> RAM from 5 to 8GB/s) has not the effect that I hoped on the calculations,
> or PCIe is still 2.0 with jessie.
>
> Now, with cuda 5.0, it should be easy to measure the bandwidth directly. I
> have to learn how and I'll report about in due course.
>
>
> Now
> nvidia-smi activates the GPUs for normal work,
> nvidia-smi -L tells about the GPUs,
> dpkg -l |grep nvidia shows all 319.60 or 5.0.35-8,
> the X-server can be started and gnome loaded (startx, gnome-session),
> nvcc --version gives 5.0,  however
>
>
> # modinfo nvidia
> ERROR: module nvidia not found
>
> In analogy with wheezy 3.2.0-4, I expected
> /lib/modules/3.10-3-amd64/updates/dkms/nvidia.ko
>
> Instead, there is
>
> /lib/modules/3.10-3-amd64/nvidia/nvidia-current.ko
>
> is that a feature of jessie or something wrong?

I think it was renamed.  No idea why.  modinfo nvidia-current should
work though.

Do you have the cuda libraries for the 319 version installed?

I don't play around with GPU computations, but from what I have read it
does need a certain size job before the overhead of transfering the
data and managing the GPU makse it worthwhile, but for large jobs the
high core count and memory bandwidth makes a big difference.

--
Len Sorensen


Reply to: