Am Tue, 04 Mar 2014 02:59:44 +0000 schrieb peter green <plugwash@p10link.net>: > Is there any quality > difference from using a fpu vs nonfpu decoder? Technically, there is. See those numbers for generic fpu and non-fpu code with and without --enable-int-quality given to configure (enables better rounding for small performance hit, you might want to activate that by default). In numbers, the difference is this: ==> src/mpg123.fpu_accurate.compliance.txt <== ==== Layer 3 ==== --> 16 bit signed integer output compl.bit: RMS=4.300914e-06 (PASS) maxdiff=7.688999e-06 (PASS) --> 32 bit integer output compl.bit: RMS=2.152784e-08 (PASS) maxdiff=1.769513e-07 (PASS) --> 24 bit integer output compl.bit: RMS=4.206462e-08 (PASS) maxdiff=1.788139e-07 (PASS) --> 32 bit floating point output compl.bit: RMS=2.153045e-08 (PASS) maxdiff=1.769513e-07 (PASS) ==> src/mpg123.fpu.compliance.txt <== ==== Layer 3 ==== --> 16 bit signed integer output compl.bit: RMS=8.907757e-06 (LIMITED) maxdiff=1.531839e-05 (PASS) --> 32 bit integer output compl.bit: RMS=2.152589e-08 (PASS) maxdiff=1.769513e-07 (PASS) --> 24 bit integer output compl.bit: RMS=4.205495e-08 (PASS) maxdiff=1.788139e-07 (PASS) --> 32 bit floating point output compl.bit: RMS=2.153045e-08 (PASS) maxdiff=1.769513e-07 (PASS) ==> src/mpg123.nofpu_accurate.compliance.txt <== ==== Layer 3 ==== --> 16 bit signed integer output compl.bit: RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS) --> 32 bit integer output compl.bit: RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS) --> 24 bit integer output compl.bit: RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS) --> 32 bit floating point output compl.bit: RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS) ==> src/mpg123.nofpu.compliance.txt <== ==== Layer 3 ==== --> 16 bit signed integer output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) --> 32 bit integer output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) --> 24 bit integer output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) --> 32 bit floating point output compl.bit: RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS) With a nofpu decoder, you always get the precision of 16 bit output, because floating point numbers are converted from 16 bit. But, especially so with --enable-int-quality, this is a fully compliante MPEG audio decoder with all the precision that you need for "normal" playback situations. MAD claims 24 bit precision with integer math (just about matching mpg123's 24 bit output with FPU decoder, see http://www.underbit.com/resources/mpeg/audio/compliance, RMS=4.906e−08) I suspect though, that MAD will be considerably slower than mpg123's arm_nofpu decoder. On my Core2Duo P8800, madplay with libmad 0.15.1 needs about 7.4 s to 8.5 s decoding to null output (with either speed or accuracy optimization). The mpg123 numbers for the generic variants (accurate == --enable-int-quality): ==> src/mpg123.fpu_accurate.bench.txt <== #mpg123 benchmark (user CPU time in seconds for decoding) #decoder t_s16/s t_f32/s generic 6.16 5.85 ==> src/mpg123.fpu.bench.txt <== #mpg123 benchmark (user CPU time in seconds for decoding) #decoder t_s16/s t_f32/s generic 6.05 5.83 ==> src/mpg123.nofpu_accurate.bench.txt <== #mpg123 benchmark (user CPU time in seconds for decoding) #decoder t_s16/s t_f32/s generic 6.67 6.81 ==> src/mpg123.nofpu.bench.txt <== #mpg123 benchmark (user CPU time in seconds for decoding) #decoder t_s16/s t_f32/s generic 6.01 6.16 You see, there is some hit from accurate rounding, but it is in a different league compared to the difference between fpu and nofpu on a NEON-less ARM device (and yes, on a x86 CPU, generic FPU code is faster when actually proucing float output). Oh, and remember: This is for mpg123 with handbrakes on, using Taihei's assembly optimizations, the decoding time is about halved on the Core2. Similarily, I'd like to see numbers for madplay on ARM (best on machines with and without fpu to get a picture about what difference we talk about): sh$ time -d -o null convergence_-_points_of_view/*.mp3 I don't know offhand how mpg123 nofpu stacks up against that, but there should be a considerable difference in speed. My guess is that, on limited hardware without NEON, you'd prefer stutter-free playback with least CPU power draw. When utmost theoretical quality really matters or you intend extensive post-processing of the data --- especially using an audio player that works with floating point math internally, like audacious --- then employing a more capable CPU with NEON is something I expect. The mpg123 nofpu decoder, according Riku's numbers, is still a good choice for systems with a FPU but no NEON, but the generic floating point decoder is not that far behind in speed (compared to softfloat) and offers proper floating point accuracy as bonus. Generally, it is a safe bet that any normal person is quite happy with 16 bit accuracy for decoded MP3s. Depending on the initial quality of the encoding, this might be everything that is sensible anyway (and its a challence to hear any difference to 24 bit on an arbitrarily expensive HiFi system). There are people preferring their 16 bit output rounded using dithering, which mpg123 also offers (--with-cpu=generic_dither), but which excludes optimizations for ARM. We are talking about the default setup for the majority of debian users here. Any quality choice should be fine for that, after all, we're talking compliant MPEG quality in any case (sometimes 'limited precision', but still). Audiophiles wanting the utmost quality from their setup (as funny as that is to many audiophiles when starting from a lossy compression;-) will love to tweak things anyway. They can always do their own build, or use an additional repository (thinking ubuntu PPAs for various such purposes) that provides a different taste. The quality difference between 1 h or 10 h time on battery while playing music is very much noticable to anyone, so the choice on armel should be settled. On armhf, there are cases where the arm_nofpu would be a better choice (decoding to 16 bit without NEON), but about 50 % CPU demand increase is less dramatic and it evens out when using floating point output. In any case ... Riku: Care to run timings of MAD on your configurations? I'm interested in how fast it is producing that 24 bit output on limited CPUs. > Lennart Sorensen wrote: > > I think so. armhf's current debian rules automatically picked arm_fpu > IMO it's often better to be explicit about this sort of thing. I agree. Never trust upstream's defaults in such sensitive matters;-) Alrighty then, Thomas
Attachment:
signature.asc
Description: PGP signature