[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFD] optimized versions of openssl



On Fri, Sep 06, 2002 at 04:53:56PM +0200, Christoph Martin wrote:
> The speedup between 386 and 486 code is a factor of 2 !!
> 
> See: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139783&repeatmerged=yes

This is false.

I think the reporter doesn't realize that "./Configure linux-elf" WILL
use pentium assembly optimizations, at least on my computer:

(cd asm; /usr/bin/perl bn-586.pl cpp >bn86unix.cpp )
gcc -E -DELF -x c asm/bn86unix.cpp | as -o asm/bn86-elf.o
(cd asm; /usr/bin/perl co-586.pl cpp >co86unix.cpp )
gcc -E -DELF -x c asm/co86unix.cpp | as -o asm/co86-elf.o

I presume that the Debian package explicitly disables the use of these
586 routines.

I don't find it surprising that a version with critical routines
optimized in pentium assembler is 2x faster. In fact, I also got a
ratio of 2 in rsa1024 speed difference between /usr/bin/openssl and a
self compiled "./Configure linux-elf ; make" build (and the latter
appears to have used i586 assembly code).

Of course I think these optimizations should be made available to
Debian users. Also, openssl's build system is awful. In my build
(standard "linux-elf") it chose to use pentium-optimized handcoded routines,
but runs gcc with the -m486 option instead of something like
-march=pentium (or even better in my case: -march=pentiumpro). These
things need to be fixed. If we make packages that include the pentium
assembler optimizations, the C code should also be targeted to
something better than a 486.

I think the ideal compromise would be to have openssl compile all the
different assembly variants and select a compatible version at
runtime. Preferably a seperate lib would be made for each subarch that
could be dynamically loaded, but it doesn't have to be implemented
this way. Runtime cpu detection has already been discussed in this
thread. The ideal way to compile the subarch-neutral C code for such
an approach would probably be with -mcpu=pentiumpro (to retain
compatibility with i386 but optimize insn scheduling for ppro), and
similar options on other platforms like sparc.



Reply to: