The operands in both cases are 16-bit so cannot overflow a 32-bit
destination. In gain_scale() the inputs are reduced to 14-bit,
so even the shift cannot overflow.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Adding instead of subtracting the products in the loop allows the
compiler to generate more efficient multiply-accumulate instructions
when 16-bit multiply-subtract is not available. ARM has only
multiply-accumulate for 16-bit operands. In general, if only one
variant exists, it is usually accumulate rather than subtract.
In the same spirit, using the dedicated saturation function enables
use of any special optimised versions of this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
C++ does not allow to mix different enums, so e.g. code comparing
ACodecID with CodecID would fail to compile with gcc.
This very evil hack should fix this problem.
In 16-bit arithmetic, x * 0xffffc is simply x * -4 with extra overflows,
(and the constant was probably meant to be 0xfffc). Combined with the
shift, this simplifies to -x >> 1. Finally, clearing the low two bits
with a 32-bit mask and switching to a 32-bit type allows more efficient
code on 32-bit machines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
At both places this function is called, mb_[xy] == s->mb_[xy]
making the call together with following code equivalent to
simply assigning zeros.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The main benefit of inlining this function is from constant
propagation for the 'field_based' argument. Instead of inlining
all calls, create two versions of the function for field_based
values of 0 and 1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This file defines a single, huge function, MPV_motion(), which
although being declared inline is not actually inlined by the
compiler (for good reason). There is thus no sense in defining
this function in a header file, resulting in multiple copies of
it in the final library.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a hidden config variable for the mpegvideo.o dependency
and selects from the codecs which require it.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This macro is only used in two places, both in libavcodec, so this
is a more sensible place for it.
Two small tweaks to the macro are made:
- removing the trailing semicolon
- dropping unnecessary 'volatile' from the x86 asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
It expects maximum value to be 32767 but calculations in scale_vector()
which uses this function can give it ABS(-32768) which leads to wrong
result and thus clipping is needed.
yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register. nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The check is bogus since the nuv frameheader is already skipped
and the (decompressed) RTjpeg header is checked.
This reverts commit f6afacdb3b.
CC: libav-stable@libav.org
If there was a failure inflating, or reinitializing
the zstream, the current frame's buffer would be lost.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>