Re: Fwd: upgrade to jessie from wheezy with cuda problems

I am attacking the problem from another side, directly from within the OS itself:

#lspi -vvvv

tells that the link speed (= link status) "LnkSta" is at 5Gb/s, no matter whether the system is at number crunching or not. I.e., my system is at PCIe 2.0. This might explain why upgrading from sandy bridge to ivy bridge gave no speed gain of molecular dynamics. PCIe 3.0 was not achieved.

As far as I could investigate, nvidia suggests to either:

(1) Modify /etc/modprobe.d/local.conf (which does not exist on jessie) or create a new

/etc/modprobe.d/nvidia.conf, adding to that

1. options nvidia NVreg_EnablePCIeGen3=1

Actually, on my jessie, nvidia.conf reads

alias nvidia nvidia-current

remove nvidia-current rm mod nvidia

Some guys found that useless, even when both grub-efi and initramfs are edited accordingly, so that nvidia offered a different move, updating the kernel boot string, by appending this:

1. options nvidia NVreg_EnablePCIeGen3=1
***************************

I did nothing, as I hope that the best adaptation to jessie may be suggested by those who know the OS better than me.

The kind of information about links includes:

LnkSta: the actual speed

LnkCap: the capacity of the specific port, as far as I can understand.

LnkCtl: ??

One could also run

#lspci -vt

to determine the bus where the GPU card is located, then running

# lspci -vv -s ##

where "##" is the location.
******************************

So, it is a tricky matter, but perhaps not so much when one knows where to put the hands. At any event, being unable to go to 8GT/s, as from PCIe 3.0, means loosing time and energy (=money and pollution), at least when the GPUs are used for long number crunching.

I'll continue investigating. The above seems to be promising. Hope to get help.

francesco pietra

With my jessie

/etc/modprobe.d includes the following files:

alsa-base.conf

alsa-case-blacklist.conf

dkms.conf (which has no active statemente)

fbdev-blacklist.conf

i915-kms.conf

nvidia.conf

nvidia-blacklist-nouveau.conf

radeon-kms.conf
**************************

On Thu, Nov 14, 2013 at 3:33 AM, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote:

On Wed, Nov 13, 2013 at 05:43:47PM -0500, Lennart Sorensen wrote:
> On Wed, Nov 13, 2013 at 10:53:30PM +0100, Francesco Pietra wrote:
> > francesco@gig64:~/tmp$ file ./CUDA-Z-0.7.189.run
> > ./CUDA-Z-0.7.189.run: data
> > francesco@gig64:~/tmp$
>
> OK that's weird. I expected to see x86 32 or 64bit binary.
>
> Seems to be a shell scripts with compressed code in it. Yuck. :)

OK I got it running. It is a 32bit binary.

I had to install these:

ii libcuda1:i386 331.20-1 i386 NVIDIA CUDA runtime library
ii libcudart5.0:i386 5.0.35-8 i386 NVIDIA CUDA runtime library
ii libgl1-nvidia-glx:i386 331.20-1 i386 NVIDIA binary OpenGL libraries
ii libstdc++6:i386 4.8.2-4 i386 GNU Standard C++ Library v3
ii libxrender1:i386 1:0.9.8-1 i386 X Rendering Extension client library
ii zlib1g:i386 1:1.2.8.dfsg-1 i386 compression library - runtime

Then I was able to run it. No messing with LD_LIBRARY_PATH or anything.

To install :i386 packages you first have to enable multiarch support
with dpkg and run apt-get update. So something like:

dpkg --add-architecture i386
apt-get update
apt-get install libcuda1:i386 libcudart5.0:i386 libgl1-nvidia-glx:i386 libstdc++6:i386 libxrender1:i386 zlib1g:i386

Don't worry about the exact versions, since I am running
unstable+experimental. You don;t need to do that to get it working.

For your 64bit code you probably need libcuda1 libcudart5.0 and such
installed in the 64bit version.

--
Len Sorensen