Re: Debian testing/unstable on RiscPC ?
Am Donnerstag, 29. Januar 2004 00:37 schrieb Philip Blundell:
> Yes, clearly it depends on your application mix. Providing an
> equivalent for "ldrsh" in ARMv3 requires something like seven
> instructions and an extra register, so code quality is clearly going to
> suffer if this happens frequently. Long multiplies are a similar story.
Only 4, and an extra register (timings are for SA-110):
ldrsh rd, [ra] @ throughput 2 cycles
@ result delay 3 cycles
can be replaced with
ldrb rs, [ra, #1] @ throughput 4 cycles
ldrb rd, [ra] @ result delay 4 cycles
mov rs, rs, lsl#24
orr rd, rd, rs, asr#16
For the [rn, rm] addressing mode it's more complex:
ldrsh rd, [ra, ro] @ throughput 2 cycles
@ result delay 3 cycles
can be replaced with
ldrb rd, [ra, ro]! @ throughput 5 cycles
ldrb rs, [ra, #1] @ result delay 5 cycles
sub ra, ra, ro
mov rs, rs, lsl#24
orr rd, rd, rs, asr#16
(I hope I did not make mistakes!)
This leads to an approximate 2-fold speed increase if the code contains only
LDRSH, which is the worst example. Considering code quality and readability,
if this is really an issue x86 should be dropped as soon as possible ;-)
BTW, from a IC designers point of view I can not understand why LDRSH takes
longer than e.g. LDRB.
> There's one other fringe benefit to ARMv4, namely that it has more
> helpful semantics for unimplemented instructions in the extension
> space. Many of these opcodes will take the undefined instruction trap
> from v4 onwards, rather than just quietly performing some bogus
> operation as happened in v3. So, it becomes feasible to provide
> in-kernel emulation for, say, BX or the v5 instructions. I'm not sure
> if this is something that will really be interesting for Debian, but
> it's worth bearing in mind.
This is probably quite interesting for the kernel, but not really an option.
Just remember the trap and decoding overhead. From my experience with
programming FastFPE I know that a FP library is approximately 4 times faster
that emulation. The gap increases if less complex operations are considered.
It is not useful to emulate e.g. the dsp-enhanced instruction set this way.
> This is true up to a point, but of course the RiscPC is an extreme case
> of this. A more typical system nowadays would have a 200MHz or 400MHz
> core with 100MHz SDRAM, so the imbalance between core performance and
> memory bandwidth is much less.
True, but only half. Not only consider bandwidth, but also latency. The core
still has to wait many ( > 10) cycles of memory clock before the result is
available to the core.
Conclusion: I think it does not really hurt if Debian does not drop armv3
support. On the other hand, there are probably not many people using old
machines. We can also only support Strongarm RiscPCs and use armv3m. But I
think this should not be done, because it would leave out many ARM7 based
devices. ARM6 is not supported anymore by recent kernels, 2.4.1? onwards I
think, due to the different and unpractical abort model.
At least the compiler should be changed back. We can wait and let the old
packages become washed out, but some time before Sarge will be released it
should be actively checked which packages have to be recompiled. I, of
course, would prefer if at least the required, important and standard
packages could be recompiled more or less instantly ;-)
Have a nice evening,
Peter Teichmann
Reply to: