On 23/11/14 22:30, Julian Taylor wrote: > what works well is just replacing the offending memory loads with the > memcpy call. As the size of the memcpy call is constant the compiler > will take care of emitting code appropriate for the platform. Ah, even better; the timing on x86-64 comes out the same. I'll cancel the NMU and retry. S