[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: Fwd: upgrade to jessie from wheezy with cuda problems



I forgot to mention that LnkSta 8GT/s is obtained only when actually carrying out the MD simulation.
fp

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret@gmail.com>
Date: Mon, Nov 18, 2013 at 10:37 PM
Subject: Re: Fwd: upgrade to jessie from wheezy with cuda problems
To: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Cc: amd64 Debian <debian-amd64@lists.debian.org>


Might need nvidia-current instead of nvidia.

It failed to bring to PCIe 3.0 when inserted into nvidia.conf

francesco@gig64:/etc/modprobe.d$ cat nvidia.conf

alias nvidia nvidia-current
remove nvidia-current rmmod nvidia
# 1. options nvidia-current NVreg_EnablePCIeGen3=1

(of course it was not commented when the test was carried out)

However, it brought to PCIe 3.0 when in the kernel GREAT SUGGESTION

Thus, I added (temporarily) to GRUB  by

1) typing 'e' at grub prompt,
2) adding the option to the END OF the linux line,
3) Ctrl-x to boot

verifying that it was taken into accout

~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10-3-amd64 root=/dev/mapper/vg1-root ro quiet 1. nvidia-current.NVreg_EnablePCIeGen3=1

#lspci -vvvv
.......

02:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 680] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation Device 0969
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Region 1: Memory at c0000000 (64-bit, prefetchable) [size=128M]
    Region 3: Memory at c8000000 (64-bit, prefetchable) [size=32M]
    Region 5: I/O ports at e000 [size=128]
    [virtual] Expansion ROM at fb000000 [disabled] [size=512K]
    Capabilities: [60] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [78] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us

            ClockPM+ Surprise- LLActRep- BwNot-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

        DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
             EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+
    Capabilities: [b4] Vendor Specific Information: Len=14 <?>
    Capabilities: [100 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status:    NegoPending- InProgress-
    Capabilities: [128 v1] Power Budgeting <?>
    Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900 v1] #19
    Kernel driver in use: nvidia
.......

The same for the other GPU.
*********************************************

Well, the surprise was that molecular dynamics for a large system (500K atoms) was very modestly accelerated. From the simulation log file:

Info: Benchmark time: 6 CPUs 0.123387 s/step 1.4281 days/ns 1171.53 MB memory

From the same simulation with same motherboard and GTX-680, but with sansy bridge i7-3930 and 1066MHz RAM:

Info: Benchmark time: 6 CPUs 0.138832 s/step 1.60686 days/ns 1161.23 MB memory

The better performance of the ivy bridge might be the result from the higher clock of both the CPU and RAM (1866MHz).
********************************

A variety of interpretaions of these observations are possible, taking into account, however, that with simple machines as the one used here, it would be difficult to run MD with much bigger systems than 500K atoms.

Finally we succeeded to get PCIe 3.0 and now the PCIe 3.0 setting can be passed permanently to the kernel. I have to learn how.

Thanks a lot
francesco pietra

 


On Mon, Nov 18, 2013 at 6:02 PM, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote:
On Sun, Nov 17, 2013 at 10:45:58AM +0100, Francesco Pietra wrote:
> I am attacking the problem from another side, directly from within the OS
> itself:
>
> #lspi -vvvv
>
> tells that the link speed (= link status) "LnkSta" is at 5Gb/s, no matter
> whether the system is at number crunching or not. I.e., my system is at
> PCIe 2.0. This might explain why upgrading from sandy bridge to ivy bridge
> gave no speed gain of molecular dynamics. PCIe 3.0 was not achieved.
>
> As far as I could investigate, nvidia suggests to either:
> (1) Modify /etc/modprobe.d/local.conf (which does not exist on jessie) or
> create a new
>
> /etc/modprobe.d/nvidia.conf, adding to that
>
> 1. options nvidia NVreg_EnablePCIeGen3=1

Might need nvidia-current instead of nvidia.

> Actually, on my jessie, nvidia.conf reads
>
> alias nvidia nvidia-current
> remove nvidia-current rm mod nvidia
>
>
> Some guys found that useless, even when both grub-efi and initramfs are
> edited accordingly, so that nvidia offered a different move, updating the
> kernel boot string, by appending this:
>
> 1. options nvidia NVreg_EnablePCIeGen3=1
> ***************************

That is NOT the syntax for a kernel command line.  It is the syntax for
the modprobe config.

Something like nvidia.NVreg_EnablePCIeGen3=1 or
nvidia-current.NVreg_EnablePCIeGen3=1 (depending on the name of the
module as far as the module is concerned).

> I did nothing, as I hope that the best adaptation to jessie may be
> suggested by those who know the OS better than me.
> The kind of information about links includes:
>
> LnkSta: the actual speed
>
> LnkCap: the capacity of the specific port, as far as I can understand.
>
> LnkCtl: ??
>
>
> One could also run
>
> #lspci -vt
>
> to determine the bus where the GPU card is located, then running
>
> # lspci -vv -s ##
>
> where "##" is the location.
> ******************************
>
> So, it is a tricky matter, but perhaps not so much when one knows where to
> put the hands. At any event, being unable to go to 8GT/s, as from PCIe 3.0,
> means loosing time and energy (=money and pollution), at least when the
> GPUs are used for long number crunching.

Well it means slower transfers of data to and from the card.  If the data
set fits in the card entirely during a long number crunch, then bandwidth
would not matter much at all.  So depends on the size of the data set
and how often data has to be moved in and out of the card.

> I'll continue investigating. The above seems to be promising. Hope to get
> help.

--
Len Sorensen



Reply to: