[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 64-bit subtract from vector unsigned int



On Tue, Apr 7, 2020 at 5:51 AM Jeffrey Walton <noloader@gmail.com> wrote:
>
> Hi Everyone,
>
> I'm porting a 64-bit algorithm to 32-bit PowerPC (an old PowerMac).
> The algorithm is simple when 64-bit is available, but it gets a little
> ugly under 32-bit.
> ...
>
> Here's what an "add with carry" looks like. The addc simply adds the
> carry into the result after transposing the carry bits from columns 1
> and 3 to columns 0 and 2.
>
> typedef __vector unsigned char uint8x16_p;
> typedef __vector unsigned int uint32x4_p;
> ...
>
> inline uint32x4_p VecAdd64(const uint32x4_p& vec1, const uint32x4_p& vec2)
> {
>     // 64-bit elements available at POWER7 with VSX, but addudm requires POWER8
> #if defined(_ARCH_PWR8)
>     return (uint32x4_p)vec_add((uint64x2_p)vec1, (uint64x2_p)vec2);
> #else
>     const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};
>     const uint32x4_p zero = {0, 0, 0, 0};
>
>     uint32x4_p cy = vec_addc(vec1, vec2);
>     cy = vec_perm(cy, zero, cmask);
>     return vec_add(vec_add(vec1, vec2), cy);
> #endif
> }

I think I found it... The compliment of the carry was throwing me off.
Subtract with borrow needs an extra vec_andc to un-compliment the
borrow:

    const uint8x16_p bmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};
    const uint32x4_p amask = {1, 1, 1, 1};
    const uint32x4_p zero = {0, 0, 0, 0};

    uint32x4_p bw = vec_subc(vec1, vec2);
    bw = vec_andc(amask, bw);
    bw = vec_perm(bw, zero, bmask);
   return vec_sub(vec_sub(vec1, vec2), bw);

Jeff


Reply to: