Re: unloading unnecessary modules
- To: firstname.lastname@example.org
- Subject: Re: unloading unnecessary modules
- From: Stan Hoeppner <email@example.com>
- Date: Tue, 30 Nov 2010 20:28:22 -0600
- Message-id: <[🔎] 4CF5B2C6.firstname.lastname@example.org>
- In-reply-to: <AANLkTin=B6iB_Mc8ddz2_gX8x7AEwwi=MM1=UiSu7gE7@mail.gmail.com>
- References: <AANLkTin_7RJ0Q_kW0xA6FryJGv-CCT38sh1Ptv8a-Rz1@mail.gmail.com> <4CF09A71.email@example.com> <AANLkTikEbEfKBtkxbeGS6+3MVygSyVMWYcXTjq=6Vxfirstname.lastname@example.org> <AANLkTim2Z-6TOg=RSrQs1u0nd8zfJKivsH6yZ7+2Zhrj@mail.gmail.com> <4CF1DF5B.email@example.com> <AANLkTi=1bRZNYBZOTgydD8xeQ4smdiu2CV9KSTknLJFt@mail.gmail.com> <AANLkTi=c4-SSX=gJyHVFUzCKZHkc9CK3k2LMvXhrnoKh@mail.gmail.com> <4CF2F5C3.firstname.lastname@example.org> <AANLkTin=B6iB_Mc8ddz2_gX8x7AEwwi=MM1=UiSu7gE7@mail.gmail.com>
Mag Gam put forth on 11/30/2010 5:17 AM:
> sorry for the late response.
> lspci gives me this about my Ethernet adapter.
> 04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
> Ethernet Controller (rev 06)
> Subsystem: Hewlett-Packard Company NC360T PCI Express Dual
> Port Gigabit Server Adapter
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 154
> Region 0: Memory at fbfe0000 (32-bit, non-prefetchable) [size=128K]
> Region 1: Memory at fbfc0000 (32-bit, non-prefetchable) [size=128K]
> Region 2: I/O ports at 5000 [size=32]
> [virtual] Expansion ROM at e6200000 [disabled] [size=128K]
The NC360T supports TCP checksum & segmentation offloading, but it
doesn't support full TCP/IP offloading like the iSCSI HBAs do. Even so,
checksum and segmentation offloading will yield a small packet latency
improvement. The latency gain, however, will be minuscule compared to
what you can get by optimizing your user land application, which is the
source of the bulk of your latency. User land always is.
> The target application is a OTP (online transaction processing) with
> is driven by CICS. The volume maybe high but latency is important. The
> application is CPU bound (80% on 1 core). but not disk/io or memory
> bound. All of our servers have 16GB of memory with 8 cores. The
> application is written in C (compiled with Intel based compiler). We
> are using a DNS cache solution and sometimes hardcoding /etc/hosts to
> avoid any DNS. It does not do too many DNS lookups.
80% of 1 core? Is this under a synthetic high transaction rate test
load against the app? What latencies are you measuring per transaction
at the application level at 80% load on that core?
> I am really interested in tcp/ip offloading from the kernel and have
> the NIC do it. I have read,
> http://fiz.stanford.edu:8081/display/ramcloud/Low+latency+RPCs and it
> seems very promising.
Like I said, optimization here is probably going to gain you very little
in decreased latency. You need to focus your optimization on the hot
code paths of the server application.
Can you tell us more about what this applications actually does? Is it
totally CPU and network bound? You says it's slaved to CICS, so I
assume you're pulling data from the mainframe CICS database over TCP/IP,
manipulating it, and writing data back to the mainframe. Is this
correct? Which mainframe model is this? Have you measured the
latencies there? Given the mainframe business, and the fact many orgs
hang onto them...forever, it's very likely that your Linux server is 2-5
times faster than the mainframe on a per core basis, and that the
mainframe is introducing the latency as it can't keep up with your Linux
server. That's merely speculation on my part at this point.