Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

To: Gabriel Paubert <paubert@iram.es>
Cc: "debian-powerpc@lists.debian.org" <debian-powerpc@lists.debian.org>
Subject: Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
From: Jeffrey Walton <noloader@gmail.com>
Date: Mon, 1 Mar 2021 20:32:52 -0500
Message-id: <[🔎] CAH8yC8mYkaiFCCmv4LhNP69zkZmZ4T5C2O-OQcpwnb7zPr0SCA@mail.gmail.com>
Reply-to: noloader@gmail.com
In-reply-to: <[🔎] 20210301082621.GB27548@lt-gp.iram.es>
References: <b11df5306c6f434e2662f7f0b484fe6d@T510i> <[🔎] CAPweEDzMKB52HaLNme=6CDh+P=-8f40SR4r=FfURovWFxWdasA@mail.gmail.com> <[🔎] 20210301082621.GB27548@lt-gp.iram.es>

On Mon, Mar 1, 2021 at 3:39 AM Gabriel Paubert <paubert@iram.es> wrote:
>
> On Sun, Feb 28, 2021 at 11:52:12PM +0000, Luke Kenneth Casson Leighton wrote:
> > On Monday, March 1, 2021, Riccardo Mottola <riccardo.mottola@libero.it>
> > wrote:
> > ...
> > Tulio Magno Quites Machado Filho is currently working on glibc6 patches
> > which reverse these erroneous assumptions, replacing them with "#ifdef VSX"
> > thus allowing people to compile code that does not rely on SIMD.
>
> Beware that VSX is not Altivec. Altivec was called VMX by IBM and
> VSX is a superset of Altivec (IIRC).

Based on my experience with Botan and Crypto++... VSX is available
with POWER7 and -mvsx compiler option. VSX is part of POWER8 core and
does not need a compiler option.

VSX is a lot like Intel tic/toc features. VSX allows a 64-bit vector
loads and stores, but it does not provide operations on 64-bit
vectors. You have to use POWER8 to get the 64-bit add (addudm),
subtract (subudm), etc.

So a POWER7+VSX 64-bit add might look like:

typedef __vector unsigned int    uint32x4_p;
typedef __vector unsigned long long uint64x2_p;

# Load 64-bit vector from uint64_t[2]
uint64x2_p a = vec_ld(...);
uint64x2_p b = vec_ld(...);

# But still perform the 32-bit add
uint64x2_p c = (uint64x2_p )VecAdd64((uint32x4_p)a, (uint32x4_p)b);

And:

uint32x4_p
VecAdd64(const uint32x4_p vec1, const uint32x4_p vec2)
{
    // The carry mask selects carry's for elements 1 and 3 and sets
    // remaining elements to 0. The result is then shifted so the
    // carried values are added to elements 0 and 2.
#if defined(MYLIB_BIG_ENDIAN)
    const uint32x4_p zero = {0, 0, 0, 0};
    const uint32x4_p mask = {0, 1, 0, 1};
#else
    const uint32x4_p zero = {0, 0, 0, 0};
    const uint32x4_p mask = {1, 0, 1, 0};
#endif

    uint32x4_p cy = vec_addc(vec1, vec2);
    uint32x4_p res = vec_add(vec1, vec2);
    cy = vec_and(mask, cy);
    cy = vec_sld (cy, zero, 4);
    return vec_add(res, cy);
}

 A POWER8 add looks as expected:

uint64x2_p
VecAdd64(const uint64x2_p vec1, const uint64x2_p vec2)
{
    return vec_add(a, b);
}

Even with the crippled 64-bit add using 32-bit elements, some
algorithms, like Bernstein's ChaCha, runs about 2.5x faster than over
the scalar unit.

Jeff

Reply to:

References:
- enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
  - From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
- Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
  - From: Gabriel Paubert <paubert@iram.es>

Prev by Date: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
Next by Date: Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
Previous by thread: Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
Next by thread: Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
Index(es):
- Date
- Thread