[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: namd-l: Fwd: nvidia issue with namd12 Debian 11



Hi Josh, no big system:
Info) Analyzing structure ...
Info)    Atoms: 107292
Info)    Bonds: 77829
Info)    Angles: 61441  Dihedrals: 46455  Impropers: 1604  Cross-terms: 158
Info)    Bondtypes: 0  Angletypes: 0  Dihedraltypes: 0  Impropertypes: 0
Info)    Residues: 31152
Info)    Waters: 30102
Info)    Segments: 128
Info)    Fragments: 30587   Protein: 9   Nucleic: 25

Following your hint, I tried MD with a very small system:

Info) Analyzing structure ...
Info)    Atoms: 1448
Info)    Bonds: 1187
Info)    Angles: 1618  Dihedrals: 699  Impropers: 0  Cross-terms: 0
Info)    Bondtypes: 0  Angletypes: 0  Dihedraltypes: 0  Impropertypes: 0
Info)    Residues: 261
Info)    Waters: 0
Info)    Segments: 33
Info)    Fragments: 261   Protein: 0   Nucleic: 0

Exactly the same error messages that I reported for the bigger system. So, it is not a problem of insufficient mem on the GTX.
My very feeble guess is that there is a mismatch between the linux kernel and the nvidia driver, but they were selected by the Debian code and other people should have met the issue. I am not sure that Debian 11 could work correctly with a downgraded couple of linux kernel/nvidia driver. Perhaps it could easier to downgrade to Debian 10, which worked correctly on my raid1 box.

thanks
francesco

Incidentally, I said namd12, while it is 14.

On Mon, Jan 17, 2022 at 1:24 PM Vermaas, Josh <vermaasj@msu.edu> wrote:

How big is your system? The error being tossed back is that you are out of memory. The GTX 680 only has 2GB of memory, and so depending on your system size you may run yourself out of memory.

 

-Josh

 

From: <owner-namd-l@ks.uiuc.edu> on behalf of Francesco Pietra <chiendarret@gmail.com>
Reply-To: "namd-l@ks.uiuc.edu" <namd-l@ks.uiuc.edu>, Francesco Pietra <chiendarret@gmail.com>
Date: Monday, January 17, 2022 at 4:40 AM
To: NAMD <namd-l@ks.uiuc.edu>, debian-users <debian-user@lists.debian.org>
Subject: namd-l: Fwd: nvidia issue with namd12 Debian 11

 

I forgot to add that commands 'nvidia-detect' and 'nvidia-smi' detect both GTX 680 as activated and tells that they are supported by all driver versions, including those for Tesla 450.

Actually, legacy nvidia drivers are only required for very old nvidia graphic cards, from 400 downwards.

 

I alsoo add that the box is at CUDA 11.2

 

---------- Forwarded message ---------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Mon, Jan 17, 2022 at 4:15 AM
Subject: nvidia issue with namd12 Debian 11
To: NAMD <namd-l@ks.uiuc.edu>, debian-users <debian-user@lists.debian.org>

 

With a Debian 11 box with two GTX 680 I am unable to get them working. The problem occurred with upgrading from debian 10 to 11 and, from namd 11 to 12 (/NAMD_Git-2021-11-27_Linux-x86_64-multicore-CUDA)

 

nvidia-driver 460.91.03-1

linux-image-amd64 5.10.84-1

linux kernel 5.10.0-10-amd64

 

Error when trying a minimization:

 

TCL: Minimizing for 3000 steps
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was encountered
[Partition 0][Node 0] End of program
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists, line 1577
 on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was encountered

 

I have also reconfigured the xserver, at no avail.

 

I have noticed issues about namd12/nvidia on the web, apparently unresolved.

 

Thanks for advice

francesco pietra

 

 


Reply to: