Might need nvidia-current instead of nvidia.
It failed to bring to PCIe 3.0 when inserted into nvidia.conf
francesco@gig64:/etc/modprobe.d$ cat nvidia.conf
alias nvidia nvidia-current
remove nvidia-current rmmod nvidia
# 1. options nvidia-current NVreg_EnablePCIeGen3=1
(of course it was not commented when the test was carried out)
However, it brought to PCIe 3.0 when in the kernel GREAT SUGGESTION
Thus, I added (temporarily) to GRUB by
1) typing 'e' at grub prompt,
2) adding the option to the END OF the linux line,
3) Ctrl-x to boot
verifying that it was taken into accout
~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10-3-amd64 root=/dev/mapper/vg1-root ro quiet 1. nvidia-current.NVreg_EnablePCIeGen3=1
#lspci -vvvv
.......
02:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 680] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 0969
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at c0000000 (64-bit, prefetchable) [size=128M]
Region 3: Memory at c8000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at e000 [size=128]
[virtual] Expansion ROM at fb000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] #19
Kernel driver in use: nvidia
.......
The same for the other GPU.
*********************************************
Well, the surprise was that molecular dynamics for a large system (500K atoms) was very modestly accelerated. From the simulation log file:
Info: Benchmark time: 6 CPUs 0.123387 s/step 1.4281 days/ns 1171.53 MB memory
From the same simulation with same motherboard and GTX-680, but with sansy bridge i7-3930 and 1066MHz RAM:
Info: Benchmark time: 6 CPUs 0.138832 s/step 1.60686 days/ns 1161.23 MB memory
The better performance of the ivy bridge might be the result from the higher clock of both the CPU and RAM (1866MHz).
********************************
A variety of interpretaions of these observations are possible, taking into account, however, that with simple machines as the one used here, it would be difficult to run MD with much bigger systems than 500K atoms.
Finally we succeeded to get PCIe 3.0 and now the PCIe 3.0 setting can be passed permanently to the kernel. I have to learn how.
Thanks a lot
francesco pietra