[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian testing/unstable on RiscPC ?

Am Donnerstag, 29. Januar 2004 00:37 schrieb Philip Blundell:
> Yes, clearly it depends on your application mix.  Providing an
> equivalent for "ldrsh" in ARMv3 requires something like seven
> instructions and an extra register, so code quality is clearly going to
> suffer if this happens frequently.  Long multiplies are a similar story.

Only 4, and an extra register (timings are for SA-110):

	ldrsh	rd, [ra]	@ throughput 2 cycles
				@ result delay 3 cycles

can be replaced with

	ldrb	rs, [ra, #1]	@ throughput 4 cycles
	ldrb	rd, [ra]	@ result delay 4 cycles
	mov	rs, rs, lsl#24
	orr	rd, rd, rs, asr#16

For the [rn, rm] addressing mode it's more complex:

	ldrsh	rd, [ra, ro]	@ throughput 2 cycles
				@ result delay 3 cycles

can be replaced with
	ldrb	rd, [ra, ro]!	@ throughput 5 cycles
	ldrb	rs, [ra, #1]	@ result delay 5 cycles
	sub	ra, ra, ro
	mov	rs, rs, lsl#24
	orr	rd, rd, rs, asr#16

(I hope I did not make mistakes!)

This leads to an approximate 2-fold speed increase if the code contains only 
LDRSH, which is the worst example. Considering code quality and readability, 
if this is really an issue x86 should be dropped as soon as possible ;-)

BTW, from a IC designers point of view I can not understand why LDRSH takes 
longer than e.g. LDRB.

> There's one other fringe benefit to ARMv4, namely that it has more
> helpful semantics for unimplemented instructions in the extension
> space.  Many of these opcodes will take the undefined instruction trap
> from v4 onwards, rather than just quietly performing some bogus
> operation as happened in v3.  So, it becomes feasible to provide
> in-kernel emulation for, say, BX or the v5 instructions.  I'm not sure
> if this is something that will really be interesting for Debian, but
> it's worth bearing in mind.

This is probably quite interesting for the kernel, but not really an option. 
Just remember the trap and decoding overhead. From my experience with 
programming FastFPE I know that a FP library is approximately 4 times faster 
that emulation. The gap increases if less complex operations are considered. 
It is not useful to emulate e.g. the dsp-enhanced instruction set this way.

> This is true up to a point, but of course the RiscPC is an extreme case
> of this.  A more typical system nowadays would have a 200MHz or 400MHz
> core with 100MHz SDRAM, so the imbalance between core performance and
> memory bandwidth is much less.

True, but only half. Not only consider bandwidth, but also latency. The core 
still has to wait many ( > 10) cycles of memory clock before the result is 
available to the core.

Conclusion: I think it does not really hurt if Debian does not drop armv3 
support. On the other hand, there are probably not many people using old 
machines. We can also only support Strongarm RiscPCs and use armv3m. But I 
think this should not be done, because it would leave out many ARM7 based 
devices. ARM6 is not supported anymore by recent kernels, 2.4.1? onwards I 
think, due to the different and unpractical abort model.

At least the compiler should be changed back. We can wait and let the old 
packages become washed out, but some time before Sarge will be released it 
should be actively checked which packages have to be recompiled. I, of 
course, would prefer if at least the required, important and standard 
packages could be recompiled more or less instantly ;-)

Have a nice evening,
Peter Teichmann

Reply to: