[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

improving ssh2 performance on SPARCv8/SPARCv9 systems



About a week ago, Bill Sommerfeld posted "improving ssh performance on
sun4m systems" on the port-sparc@netbsd mailing list... the problem he
pointed out there wasn't netbsd-specific, though.  Bill, more or less
laid out what's in the rest of this email, so thanks to him and thanks
to Eric Brower for forwarding me his email.

For those of you upgrading to Woody (testing) or Sid (unstable), you
might have noticed that ssh2 performance is really slow.  Basically, the
problem is that sshd uses libcrypto, which was built for the lowest
common demoninator -- SPARC v7, or the sun4/sun4c machines.  SPARC v8
added to the architecture an integer multiply instruction (among other
things) which greatly speeds this up.

I rebuilt libssl0.9.6, adding a "-mv8" flag to gcc as detailed in the
gcc manpage:

              -mv8 will give you SPARC v8 code.  The only differ­
              ence from v7 code is that the  compiler  emits  the
              integer  multiply  and  integer divide instructions
              which exist in SPARC v8 but not in SPARC v7.

Below are the results of an "openssl speed rsa" and an "openssl speed
dsa" on my U1/170E ... and a "time ssh grape /bin/true" (using ssh-agent
for the authentication) from my laptop to it:

*** BEFORE ***

grape:~$ openssl speed rsa
Doing 512 bit private rsa's for 10s: 72 512 bit private RSA's in 10.05s
Doing 512 bit public rsa's for 10s: 707 512 bit public RSA's in 10.00s
Doing 1024 bit private rsa's for 10s: 12 1024 bit private RSA's in 10.87s
Doing 1024 bit public rsa's for 10s: 196 1024 bit public RSA's in 10.03s
Doing 2048 bit private rsa's for 10s: 2 2048 bit private RSA's in 12.33s
Doing 2048 bit public rsa's for 10s: 53 2048 bit public RSA's in 10.05s
Doing 4096 bit private rsa's for 10s: 1 4096 bit private RSA's in 44.32s
Doing 4096 bit public rsa's for 10s: 14 4096 bit public RSA's in 10.08s
OpenSSL 0.9.6c 21 dec 2001
built on: Mon Jan  7 00:43:23 UTC 2002
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_IDEA -DNO_MDC2 -DNO_RC5 -DB_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
                  sign    verify    sign/s verify/s
rsa  512 bits   0.1396s   0.0141s      7.2     70.7
rsa 1024 bits   0.9058s   0.0512s      1.1     19.5
rsa 2048 bits   6.1650s   0.1896s      0.2      5.3
rsa 4096 bits  44.3200s   0.7200s      0.0      1.4

grape:~$ openssl speed dsa
Doing 512 bit sign dsa's for 10s: 72 512 bit DSA signs in 10.10s
Doing 512 bit verify dsa's for 10s: 58 512 bit DSA verify in 10.11s
Doing 1024 bit sign dsa's for 10s: 20 1024 bit DSA signs in 10.03s
Doing 1024 bit verify dsa's for 10s: 17 1024 bit DSA verify in 10.47s
OpenSSL 0.9.6c 21 dec 2001
built on: Mon Jan  7 00:43:23 UTC 2002
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_IDEA -DNO_MDC2 -DNO_RC5 -DB_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
                  sign    verify    sign/s verify/s
dsa  512 bits   0.1403s   0.1743s      7.1      5.7
dsa 1024 bits   0.5015s   0.6159s      2.0      1.6

kermit:~$ time ssh grape /bin/true

real	0m17.197s
user	0m0.250s
sys	0m0.000s

*** AFTER ***

grape:~$ openssl speed rsa
Doing 512 bit private rsa's for 10s: 467 512 bit private RSA's in 10.01s
Doing 512 bit public rsa's for 10s: 4919 512 bit public RSA's in 10.00s
Doing 1024 bit private rsa's for 10s: 80 1024 bit private RSA's in 10.00s
Doing 1024 bit public rsa's for 10s: 1490 1024 bit public RSA's in 10.00s
Doing 2048 bit private rsa's for 10s: 13 2048 bit private RSA's in 10.47s
Doing 2048 bit public rsa's for 10s: 420 2048 bit public RSA's in 10.01s
Doing 4096 bit private rsa's for 10s: 2 4096 bit private RSA's in 11.17s
Doing 4096 bit public rsa's for 10s: 114 4096 bit public RSA's in 10.00s
OpenSSL 0.9.6c 21 dec 2001
built on: Tue Mar 19 03:42:21 PST 2002
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_IDEA -DNO_MDC2 -DNO_RC5 -mv8 -DB_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0214s   0.0020s     46.7    491.9
rsa 1024 bits   0.1250s   0.0067s      8.0    149.0
rsa 2048 bits   0.8054s   0.0238s      1.2     42.0
rsa 4096 bits   5.5850s   0.0877s      0.2     11.4

grape:~$ openssl speed dsa
Doing 512 bit sign dsa's for 10s: 468 512 bit DSA signs in 10.01s
Doing 512 bit verify dsa's for 10s: 374 512 bit DSA verify in 10.02s
Doing 1024 bit sign dsa's for 10s: 111 1024 bit DSA signs in 10.04s
Doing 1024 bit verify dsa's for 10s: 119 1024 bit DSA verify in 10.04s
OpenSSL 0.9.6c 21 dec 2001
built on: Tue Mar 19 03:42:21 PST 2002
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_IDEA -DNO_MDC2 -DNO_RC5 -mv8 -DB_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0214s   0.0268s     46.8     37.3
dsa 1024 bits   0.0905s   0.0844s     11.1     11.9

kermit:~$ time ssh grape /bin/true

real	0m2.754s
user	0m0.230s
sys	0m0.000s

*** END ***

Just a quick guess that it looks like a factor of 7 or 8 in speed
improvement.  So, how can you get this wonderful benefit on your
machine, you ask?  You can either build it yourself:

1) apt-get source openssl0.9.6
2) edit openssl-0.9.6c/Configure ... search down to "debian-sparc" and
   add in a "-mv8" right after the "gcc:"
3) debuild

Or you can snarf it from me:
http://www.sunsparc.org/linux/libssl/libssl0.9.6_0.9.6c-1_sparc.deb

*** You will need to restart sshd after upgrading that package. ***

If you build it yourself, you only need to install the libssl0.9.6
package to get this speed gain to sshd... it builds a bunch of other
packages, but they don't affect sshd's performance.

Finally, this information is for sun4m, sun4d and sun4u machines... if
you've got a sun4/sun4c, you're just plain stuck in slowville, sorry.



Reply to: