The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
When downmixing 2.1 to 2-channel, if the 2.0 portion is Lt/Rt, sum-difference or dual mono, the actual output will be the same (with the LFE either mixed-in or discarded).
Also, when downmixing an arbitrary layout to 2-channel, if the bitstream contains custom downmix coefficients targeting Lt/Rt, then the output will be Lt/Rt rather than regular Stereo.
The check for (prim_channels > 2) before calling dca_downmix made these
cases unreachable, but now 2.1 layouts will go through the downmix code.
Having dual mono, Lt/Rt and sum-difference layouts print errors when
regular Stereo doesn't seems pointless.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It was based on an old, seemingly incorrect specification, so default
coefficients were always used anyway.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
When the extra rear channel is present but unused, the
s->channel_order_tab[] value for that channel is -1. The QMF can be
skipped for the extra channel, and doing so avoids an out-of-array read
on s->samples_chanptr[].
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
It makes more sense for a bit mask to use an unsigned type.
The change should be source and binary compatible on all
supported systems, hence micro version bump.
Fixes a few invalid shifts.
Signed-off-by: Mans Rullgard <mans@mansr.com>