This function is always called with a non-negative argument, so
those special cases are not needed. In the places the argument
might be zero, the return value for a zero argument does not matter
since it would then be used to scale an array full of zeros.
Signed-off-by: Mans Rullgard <mans@mansr.com>
many branches and cases of scale_vector are irrelevant for the case here
and by inlining they can be reliably removed.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It appears someone thinks this special case can be reached
Well, it cannot, thus not only do we not need to optimize it
we dont need it at all
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Although a reasonable compiler will probably optimise out the
actual store and load, this operation still implies a truncation
to 16 bits which the compiler will probably not realise is not
necessary here.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Writing the scaled excitation to a scratch buffer (borrowing the
'audio' array) instead of modifying it in place avoids the need
to save and restore the unscaled values.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Use saturating addition functions instead of 64-bit intermediates
and separate clipping. This is much faster when dedicated
instructions are available.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Firstly, nothing in this function can overflow 32 bits so the use
of a 64-bit type is completely unnecessary. Secondly, the scale
is either a power of two or 0x7fff. Doing separate loops for these
cases avoids using multiplications. Finally, since only the number
of bits, not the actual value, of the maximum value is needed, the
bitwise or of all the values serves the purpose while being faster.
It is worth noting that even if overflow could happen, it was not
handled correctly anyway.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The operands in both cases are 16-bit so cannot overflow a 32-bit
destination. In gain_scale() the inputs are reduced to 14-bit,
so even the shift cannot overflow.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In 16-bit arithmetic, x * 0xffffc is simply x * -4 with extra overflows,
(and the constant was probably meant to be 0xfffc). Combined with the
shift, this simplifies to -x >> 1. Finally, clearing the low two bits
with a 32-bit mask and switching to a 32-bit type allows more efficient
code on 32-bit machines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
It expects maximum value to be 32767 but calculations in scale_vector()
which uses this function can give it ABS(-32768) which leads to wrong
result and thus clipping is needed.
Fixed codebook mode in 5300 rate may write up to SUBFRAME_LEN + 4 and
that is considered normal by the reference decoder. Without that additional
padding it might overwrite first elements of LPC history.