[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM



On Wed, Feb 26, 2014 at 01:59:12AM +0900, Taihei Momma wrote:
> On 2014/02/26, at 1:44, Thomas Orgis wrote:
> 
> > That address didn't change.
> 
> 
> Well, the function itself is properly aligned (so my fix didn't take effect anyway).
> > 0xb6fb9330 <+0>:     vpush   {d8-d15}
> > 0xb6fb9334 <+4>:     sub     r3, pc, #140    ; 0x8c
> 
> But the processor decoded the first instruction as 2-byte (thumb?), then increased PC by 2. And it raised SIGILL at
> > 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
> 
> 
> So, I guess
>  - assembler emits a bad machine code for vpush
> or
>  - kernel is not configured properly to run vfp instructions

Is that a kernel option?  I wouldn't have thought armhf would run without
that (unless no floating point code is every being run).

Well the kernel that is running has this:

CONFIG_VFP=y
CONFIG_VFPv3=y
CONFIG_NEON=y

> I'd like to look into objdump -d result to check the machine code. 

Remember Debian armhf is -mthumb by default.  Any assembly code needs
to be properly flagged with .arm, or .syntax unified or whatever is
appropriate (still trying to wrap my head around this myself).  That is
if the assembly code is written in arm rather than thumb2 assembly.
At least that's my understanding so far.  If I add .syntax unified and
.fpu neon, then I no longer have to pass -mfpu neon to the CFLAGS to
get it to compile, but it still fails.  I am just about to test the new
version to see if that helps anything.

The disassembly in gcc shows 4 byte alignment, but the address of the
illegal instruction is 2 bytes past the vpush instruction's address.

In fact if I add -marm to the CFLAGS, then it seems to work, so the .S
files are not being flagged correctly as being arm code, or they are
missing thumb interworking bits or something.

root@rceng05:/mpg123-20140225173909# perl scripts/benchmark-cpu.pl src/mpg123 /convergence_-_points_of_view/*mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decoder        t_s16/s t_f32/s
NEON    7.52    7.65

That was with CFLAGS=-g -mcpu=cortex-a15 -mfpu=neon -marm

Without -marm, it crashes with illegal instruction.  But since -mthumb
is the default on armhf, then passing -marm seems wrong.

-- 
Len Sorensen


Reply to: