[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

64-bit subtract from vector unsigned int



Hi Everyone,

I'm porting a 64-bit algorithm to 32-bit PowerPC (an old PowerMac).
The algorithm is simple when 64-bit is available, but it gets a little
ugly under 32-bit.

PowerPC has a "Vector Subtract Carryout Unsigned Word" (vsubcuw),
https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf. The
altivec intrinsics are vec_vsubcuw and vec_subc.

The problem is, I don't know how to use it. I've been experimenting
with it but I don't see the use (yet).

How does one use vsubcuw to implement a subtract with borrow?

Thanks in advance.

==========================================

Here's what an "add with carry" looks like. The addc simply adds the
carry into the result after transposing the carry bits from columns 1
and 3 to columns 0 and 2.

typedef __vector unsigned char uint8x16_p;
typedef __vector unsigned int uint32x4_p;
...

inline uint32x4_p VecAdd64(const uint32x4_p& vec1, const uint32x4_p& vec2)
{
    // 64-bit elements available at POWER7 with VSX, but addudm requires POWER8
#if defined(_ARCH_PWR8)
    return (uint32x4_p)vec_add((uint64x2_p)vec1, (uint64x2_p)vec2);
#else
    const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};
    const uint32x4_p zero = {0, 0, 0, 0};

    uint32x4_p cy = vec_addc(vec1, vec2);
    cy = vec_perm(cy, zero, cmask);
    return vec_add(vec_add(vec1, vec2), cy);
#endif
}

==========================================

Here's what I have for subtract with borrow in terms of addition.
There are 4 loads and then 9 instructions. I know it is too
inefficient.

    const uint32x4_p mask = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff};
    const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};
    const uint32x4_p zero = {0, 0, 0, 0};
    const uint32x4_p  one = {0, 1, 0, 1};

    // one's compliment, still need to add 1
    uint32x4_p comp = vec_andc(mask, vec2);

    uint32x4_p cy = vec_addc(one, comp);
    cy = vec_perm(cy, zero, cmask);
    comp = vec_add(vec_add(one, comp), cy);

    cy = vec_addc(vec1, comp);
    cy = vec_perm(cy, zero, cmask);
    return vec_add(vec_add(vec1, comp), cy);


Reply to: