I'm adding the mpg123 assembly guru to the CC list, as I imagine he would be interested in why his ARM NEON code doesn't work on a Cortex A8 chip here. Needless to say, it worked before (on other systems). Also, the precision of the arm_nofpu code does not look right. This topic is now shifting towards mpg123 development, but as long as it's only on this debian platform that it's not working, I guess it is on-topic for debian, too. Am Fri, 21 Feb 2014 01:29:40 +0000 schrieb peter green <plugwash@p10link.net>: > Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat out > of date) debian sid armhf chroot > Built with ./configure --with-cpu=arm_nofpu > #mpg123 benchmark (user CPU time in seconds for decoding) > #decoder t_s16/s t_f32/s > ARM 30.36 34.26 > > Built with ./configure --with-cpu=generic_fpu > #mpg123 benchmark (user CPU time in seconds for decoding) > #decoder t_s16/s t_f32/s > generic 148.66 138.49 That seems to prove a point about trying to use the nofpu build. How does --with-cpu=generic_nofpu stack up for this machine? Also regarding the compliance test later on ... > Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon > #mpg123 benchmark (user CPU time in seconds for decoding) > #decoder t_s16/s t_f32/s > NEON 0.03 0.04 Yeah, as we see > Illegal instruction this is most interesting. I refer to Taihei, as I don't have a NEON setup at hand (need to get a debian chroot going on my phone). > root@plugwash:/mpg123-test# > LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/ > perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123 > > ==== Layer 1 ==== > --> 16 bit signed integer output > fl1.bit: RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL) > fl2.bit: RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL) That doesn't look pretty to me. Does it _sound_ like (metal) music (in case no audio chip there, decode to WAV with -w output.wav, I happily accept snippets, limit number of frames via -n 500). > root@plugwash:/mpg123-test# > LD_LIBRARY_PATH=/mpg123-20140220132548-generic_fpu/src/libmpg123/.libs/ > perl compliance.pl /mpg123-20140220132548-generic_fpu/src/mpg123 > > ==== Layer 1 ==== > --> 16 bit signed integer output > fl1.bit: RMS=8.683659e-06 (PASS) maxdiff=1.525879e-05 (PASS) > fl2.bit: RMS=8.686681e-06 (PASS) maxdiff=1.525879e-05 (PASS) > fl3.bit: RMS=8.737660e-06 (PASS) maxdiff=1.525879e-05 (PASS) Yes, that is better. Can you compare --with-cpu=generic_nofpu to isolate this to the assembly version for ARM? This is how it looks with generic_nofpu on my box: sh$ perl ../test/compliance.pl src/mpg123 ==== Layer 1 ==== --> 16 bit signed integer output fl1.bit: RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS) fl2.bit: RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS) fl3.bit: RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS) fl4.bit: RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS) fl5.bit: RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL) fl6.bit: RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS) fl7.bit: RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS) fl8.bit: RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS) --> 32 bit integer output fl1.bit: RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS) fl2.bit: RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS) fl3.bit: RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS) fl4.bit: RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS) fl5.bit: RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL) fl6.bit: RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS) fl7.bit: RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS) fl8.bit: RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS) --> 24 bit integer output fl1.bit: RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS) fl2.bit: RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS) fl3.bit: RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS) fl4.bit: RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS) fl5.bit: RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL) fl6.bit: RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS) fl7.bit: RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS) fl8.bit: RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS) --> 32 bit floating point output fl1.bit: RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS) fl2.bit: RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS) fl3.bit: RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS) fl4.bit: RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS) fl5.bit: RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL) fl6.bit: RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS) fl7.bit: RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS) fl8.bit: RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS) ==== Layer 2 ==== --> 16 bit signed integer output fl10.bit: RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS) fl11.bit: RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS) fl12.bit: RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS) fl13.bit: RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS) fl14.bit: RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL) fl15.bit: RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS) fl16.bit: RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS) --> 32 bit integer output fl10.bit: RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS) fl11.bit: RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS) fl12.bit: RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS) fl13.bit: RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS) fl14.bit: RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL) fl15.bit: RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS) fl16.bit: RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS) --> 24 bit integer output fl10.bit: RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS) fl11.bit: RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS) fl12.bit: RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS) fl13.bit: RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS) fl14.bit: RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL) fl15.bit: RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS) fl16.bit: RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS) --> 32 bit floating point output fl10.bit: RMS=7.983482e-06 (PASS) maxdiff=2.837181e-05 (PASS) fl11.bit: RMS=7.971939e-06 (PASS) maxdiff=3.039837e-05 (PASS) fl12.bit: RMS=7.947400e-06 (PASS) maxdiff=2.884865e-05 (PASS) fl13.bit: RMS=7.871138e-06 (PASS) maxdiff=2.616644e-05 (PASS) fl14.bit: RMS=1.845901e-05 (LIMITED) maxdiff=6.735325e-05 (FAIL) fl15.bit: RMS=9.506695e-06 (LIMITED) maxdiff=3.713369e-05 (PASS) fl16.bit: RMS=8.529689e-06 (PASS) maxdiff=4.535913e-05 (PASS) ==== Layer 3 ==== --> 16 bit signed integer output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) --> 32 bit integer output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) --> 24 bit integer output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) --> 32 bit floating point output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) Thanks for the time you take (also the folks being spammed with this discussion;-). I'm confident we'll get to a bright future with mp3 decoding on debian/ARM soon. Alrighty then, Thomas
Attachment:
signature.asc
Description: PGP signature