By replacing memcpy with an unrolled loop using the alignment knowledge
it has, some speedup can be obtained.
Before (gcc 4.6.1): ~400 cycles
After: ~370 cycles
Overall, around 2% speed increase when decoding a 2400s mp3 to f32le.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
At the very least this should fix warnings about unused static
functions if one or more of these is not defined.
However even compilation might be broken if the compiler does
not optimize the function away completely.
This actually happens in case of the AVX function, since the
function pointer is used in an assignment that is not under
an #if and thus probably only optimized away after the function
was already marked as used.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.
Signed-off-by: Mans Rullgard <mans@mansr.com>