[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: gnupg



On Fri, Feb 23, 2001 at 10:28:58PM +0000, Philip Blundell wrote:
> This kind of thing (from longlong.h) is never good news.  Someone probably 
> needs to figure out what the `arm3' version is doing and why it doesn't work 
> on armv4.  (It ought not to be using explicit hard registers either.)

Do you have anything apart from its comment to suggest that it even
works on arm3?

It appears to be splitting the numbers a and b (which it is about to
multiply) into top and bottom 16 bits

ie 

0xAAAAaaaa * 0xBBBBbbbb

to

((0xAAAA << 16) + 0xaaaa) * ((0xBBBB << 16) + 0xbbbb)

to multiply and then sum:

 (0xAAAA * 0xBBBB)                    << 32
+         (0xAAAA * 0xbbbb)           << 16
+         (0xaaaa * 0xBBBB)           << 16
+                   (0xaaaa * 0xbbbb)

(the "addcs   %0, %0, #65536" is where it propagates the [carry from adding
 the central pair of partial products] into the result high word])

except that

> /* SAM This does not work on arm4 */
> #define umul_ppmm(xh, xl, a, b) \
>   __asm__ ("%@ Inlined umul_ppmm
>         mov     %|r0, %2, lsr #16
>         mov     %|r2, %3, lsr #16
                  ^^^^

>         bic     %|r1, %2, %|r0, lsl #16
>         bic     %|r2, %3, %|r2, lsl #16
                  ^^^^

I think the author has messed up by using the same register twice here.
I think it should be %0 (untested)

>         mul     %1, %|r1, %|r2
>         mul     %|r2, %|r0, %|r2
>         mul     %|r1, %0, %|r1
>         mul     %0, %|r0, %0
>         adds    %|r1, %|r2, %|r1
>         addcs   %0, %0, #65536
>         adds    %1, %1, %|r1, lsl #16
>         adc     %0, %0, %|r1, lsr #16"                                  \
>            : "=&r" ((USItype)(xh)),                                     \
>              "=r" ((USItype)(xl))                                       \
>            : "r" ((USItype)(a)),                                        \
>              "r" ((USItype)(b))                                         \
>            : "r0", "r1", "r2")
> #elif __ARM_ARCH_4__
> #define umul_ppmm(xh, xl, a, b) \
>   __asm__ ("%@ Inlined umul_ppmm
>         umull   %r1, %r0, %r2, %r3" \
>                    : "=&r" ((USItype)(xh)), \
>                      "=r" ((USItype)(xl)) \
>                    : "r" ((USItype)(a)), \
>                      "r" ((USItype)(b)) \
>                    : "r0", "r1")
> #else
> #error Untested architecture
> #endif

I don't believe that the non ARM3 code should actually work, and I see no
reason why (if I've found the bug) the corrected code should not work
equally well on ARM4
(but more slowly than an umull ?)

Nicholas Clark



Reply to: