Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

To: peter green <plugwash@p10link.net>, 738981@bugs.debian.org
Cc: Steve McIntyre <steve@einval.com>, Riku Voipio <riku.voipio@iki.fi>, Reinhard Tartler <siretart@gmail.com>, debian-arm@lists.debian.org, Taihei Momma <tmkk@mac.com>
Subject: Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
From: Thomas Orgis <thomas-forum@orgis.org>
Date: Fri, 21 Feb 2014 02:56:43 +0100
Message-id: <[🔎] 20140221025643.267db88a@orgis.org>
In-reply-to: <[🔎] 5306AC04.2000307@p10link.net>
References: <20140216104635.0ebd260b@orgis.org> <[🔎] CAJ0cceYJumVF4z7Bthca9moo02qoM+1QrNSPRL1Um2srdhtfvg@mail.gmail.com> <[🔎] 53014BD1.9040702@p10link.net> <[🔎] CAJ0ccebm8MaZNq1X7cJkryMoFkxd6kS7H36myDL2kCq9i=1UqQ@mail.gmail.com> <[🔎] 53015192.2070707@p10link.net> <[🔎] CAJ0ccea4iL_Je2K+Kvf1rQxxefQvQwgyW7e09KsHy=MJ0a-TSQ@mail.gmail.com> <[🔎] 530157AE.1070202@p10link.net> <[🔎] CAJ0ccea3Htyg6M480zAt-zqvB9SZpXKea==56eVF1oVCBvCYvQ@mail.gmail.com> <[🔎] 20140217080048.GA14396@afflict.kos.to> <[🔎] 20140217114316.006a4249@orgis.org> <[🔎] 20140217123430.GA12535@einval.com> <[🔎] 20140220134345.3792ad1a@orgis.org> <[🔎] 5306AC04.2000307@p10link.net>

I'm adding the mpg123 assembly guru to the CC list, as I imagine he
would be interested in why his ARM NEON code doesn't work on a Cortex
A8 chip here. Needless to say, it worked before (on other systems).
Also, the precision of the arm_nofpu code does not look right. This
topic is now shifting towards mpg123 development, but as long as it's
only on this debian platform that it's not working, I guess it is
on-topic for debian, too.

Am Fri, 21 Feb 2014 01:29:40 +0000
schrieb peter green <plugwash@p10link.net>: 

> Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat out 
> of date) debian sid armhf chroot

> Built with ./configure --with-cpu=arm_nofpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decoder        t_s16/s t_f32/s
> ARM     30.36   34.26
> 
> Built with ./configure --with-cpu=generic_fpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decoder        t_s16/s t_f32/s
> generic 148.66  138.49

That seems to prove a point about trying to use the nofpu build. How
does --with-cpu=generic_nofpu stack up for this machine? Also regarding
the compliance test later on ...

> Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decoder        t_s16/s t_f32/s
> NEON    0.03    0.04

Yeah, as we see

> Illegal instruction

this is most interesting. I refer to Taihei, as I don't have a NEON
setup at hand (need to get a debian chroot going on my phone).

> root@plugwash:/mpg123-test# 
> LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/ 
> perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123
> 
> ==== Layer 1 ====
> --> 16 bit signed integer output
> fl1.bit:        RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:        RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)

That doesn't look pretty to me. Does it _sound_ like (metal) music (in
case no audio chip there, decode to WAV with -w output.wav, I happily
accept snippets, limit number of frames via -n 500).

> root@plugwash:/mpg123-test# 
> LD_LIBRARY_PATH=/mpg123-20140220132548-generic_fpu/src/libmpg123/.libs/ 
> perl compliance.pl /mpg123-20140220132548-generic_fpu/src/mpg123
> 
> ==== Layer 1 ====
> --> 16 bit signed integer output
> fl1.bit:        RMS=8.683659e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl2.bit:        RMS=8.686681e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl3.bit:        RMS=8.737660e-06 (PASS) maxdiff=1.525879e-05 (PASS)

Yes, that is better. Can you compare --with-cpu=generic_nofpu to
isolate this to the assembly version for ARM? This is how it looks with
generic_nofpu on my box:

sh$ perl ../test/compliance.pl src/mpg123

==== Layer 1 ====
--> 16 bit signed integer output
fl1.bit:	RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:	RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:	RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:	RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:	RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:	RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:	RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:	RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
--> 32 bit integer output
fl1.bit:	RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:	RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:	RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:	RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:	RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:	RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:	RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:	RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
--> 24 bit integer output
fl1.bit:	RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:	RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:	RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:	RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:	RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:	RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:	RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:	RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
--> 32 bit floating point output
fl1.bit:	RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:	RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:	RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:	RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:	RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:	RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:	RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:	RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)

==== Layer 2 ====
--> 16 bit signed integer output
fl10.bit:	RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS)
fl11.bit:	RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS)
fl12.bit:	RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS)
fl13.bit:	RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS)
fl14.bit:	RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL)
fl15.bit:	RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS)
fl16.bit:	RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS)
--> 32 bit integer output
fl10.bit:	RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS)
fl11.bit:	RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS)
fl12.bit:	RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS)
fl13.bit:	RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS)
fl14.bit:	RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL)
fl15.bit:	RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS)
fl16.bit:	RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS)
--> 24 bit integer output
fl10.bit:	RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS)
fl11.bit:	RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS)
fl12.bit:	RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS)
fl13.bit:	RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS)
fl14.bit:	RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL)
fl15.bit:	RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS)
fl16.bit:	RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS)
--> 32 bit floating point output
fl10.bit:	RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS)
fl11.bit:	RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS)
fl12.bit:	RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS)
fl13.bit:	RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS)
fl14.bit:	RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL)
fl15.bit:	RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS)
fl16.bit:	RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS)

==== Layer 3 ====
--> 16 bit signed integer output
compl.bit:	RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
--> 32 bit integer output
compl.bit:	RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
--> 24 bit integer output
compl.bit:	RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
--> 32 bit floating point output
compl.bit:	RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)

Thanks for the time you take (also the folks being spammed with this
discussion;-). I'm confident we'll get to a bright future with mp3
decoding on debian/ARM soon.


Alrighty then,

Thomas

Attachment: signature.asc
Description: PGP signature

Reply to:

References:
- Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Reinhard Tartler <siretart@gmail.com>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: peter green <plugwash@p10link.net>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Reinhard Tartler <siretart@gmail.com>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: peter green <plugwash@p10link.net>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Reinhard Tartler <siretart@gmail.com>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: peter green <plugwash@p10link.net>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Reinhard Tartler <siretart@gmail.com>
- Re: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Riku Voipio <riku.voipio@iki.fi>
- Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Thomas Orgis <thomas-forum@orgis.org>
- Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Steve McIntyre <steve@einval.com>
- Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: Thomas Orgis <thomas-forum@orgis.org>
- Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
  - From: peter green <plugwash@p10link.net>

Prev by Date: Re: [PATCH 21/21] ARM: Kirkwood: Remove DT support
Next by Date: Re: [PATCH 21/21] ARM: Kirkwood: Remove DT support
Previous by thread: Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
Next by thread: Re: Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM
Index(es):
- Date
- Thread