[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: i386 compatibility & libstdc++

Hash: SHA1

On Tuesday 29 April 2003 07:50, you wrote:
> Arnd Bergmann <arnd@arndb.de> writes:
> > No, look at my patch again. If you build without i486 optimization,
> > the compiler will see only the extern declaration for
> > __exchange_and_add().
> I see. What sonames do you suggest to give to the two copies of
> libstdc++? You once said you'd call them libstdc++-i386.so.5,
Yes, either have libstdc++-i386 for i386 optimized binaries plus
libstdc++ for others or have libstdc++ everywhere. Both should
work in principle. Using libstdc++-i386 will break Debian binaries
on other platforms explicitly, which may or may not be considered
a good idea.

> 2. Running Debian binaries on foreign systems won't be easy.
>    In particular, they all link to libstdc++-i386.so.5, so
>    such a library needs to be provided for other systems.
>    Mixing that library with that native libstdc++.so.5 might
>    cause problems, so anybody running a Debian binary on
>    a foreign system would need the binary and all shared libraries
>    it links with, even though those libraries have the same
>    sonames as the libraries available on the foreign system.
There are two ways out of this:
 a) The patch gets merged upstream. It won't hurt anyone who is
    building i486+ optimized binaries and fixes a real bug.
    This would mean we should not have libstdc++-i386.so.5.
 b) We provide a libstdc++-i386.so.$(version) file that contains
    only the __exchange_and_add function and is linked to 

> 3. Debian i486 binaries take a significant performance hit.
>    The attached program demonstrates that the cost of
>    __atomic_add is roughly twice as much if done out-of-line,
>    compared to the inline version. On my system, I get
> inline: 2.4061
> out-of-line: 4.60658

We can shave a bit off by making the function __attribute__((regparm(2)))
and perhaps by using a trivial non-locking variant when compiling
without threads, as the i386 version uses the mutex only in those
cases and AFAICS it is compatible with the i486 version otherwise.
The numbers I get on my P3 now are (in average cpu cycles):

		non-locked   locked
i486 inline:       6.5        24.2
i486 out-of-line:  7.3        35.8
i386 inline:       4.5       189.9
i386 out-of-line:  9.9       196.4

If we know at compile time that locking (neither 'lock;' prefix nor
the mutex call) is never needed, we can even get much faster than the
current i486 code.

Also, if an application or library cares about this sort of 
micro-optimization, it probably should be provided in an optimized
version anyway.

	Arnd <><
Version: GnuPG v1.2.1 (GNU/Linux)


Reply to: