64-bit subtract from vector unsigned int
Hi Everyone,
I'm porting a 64-bit algorithm to 32-bit PowerPC (an old PowerMac).
The algorithm is simple when 64-bit is available, but it gets a little
ugly under 32-bit.
PowerPC has a "Vector Subtract Carryout Unsigned Word" (vsubcuw),
https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf. The
altivec intrinsics are vec_vsubcuw and vec_subc.
The problem is, I don't know how to use it. I've been experimenting
with it but I don't see the use (yet).
How does one use vsubcuw to implement a subtract with borrow?
Thanks in advance.
==========================================
Here's what an "add with carry" looks like. The addc simply adds the
carry into the result after transposing the carry bits from columns 1
and 3 to columns 0 and 2.
typedef __vector unsigned char uint8x16_p;
typedef __vector unsigned int uint32x4_p;
...
inline uint32x4_p VecAdd64(const uint32x4_p& vec1, const uint32x4_p& vec2)
{
// 64-bit elements available at POWER7 with VSX, but addudm requires POWER8
#if defined(_ARCH_PWR8)
return (uint32x4_p)vec_add((uint64x2_p)vec1, (uint64x2_p)vec2);
#else
const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};
const uint32x4_p zero = {0, 0, 0, 0};
uint32x4_p cy = vec_addc(vec1, vec2);
cy = vec_perm(cy, zero, cmask);
return vec_add(vec_add(vec1, vec2), cy);
#endif
}
==========================================
Here's what I have for subtract with borrow in terms of addition.
There are 4 loads and then 9 instructions. I know it is too
inefficient.
const uint32x4_p mask = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff};
const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};
const uint32x4_p zero = {0, 0, 0, 0};
const uint32x4_p one = {0, 1, 0, 1};
// one's compliment, still need to add 1
uint32x4_p comp = vec_andc(mask, vec2);
uint32x4_p cy = vec_addc(one, comp);
cy = vec_perm(cy, zero, cmask);
comp = vec_add(vec_add(one, comp), cy);
cy = vec_addc(vec1, comp);
cy = vec_perm(cy, zero, cmask);
return vec_add(vec_add(vec1, comp), cy);
Reply to: