The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Also adjust header #include order and some comments.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Prevents using GetBitContexts with data from previous calls.
Fixes access to freed memory.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
With cli usage the decoder might have not set the colorspace during
encoder init, manual colorspace override might be needed in such
cases.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
A dependent slice cannot have address 0.
Prevent an out of array bound load in ff_hevc_cabac_init().
Sample-Id: 00001406-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
According to my understanding of T-REC-H.265-2013044 chapter 8.6.1.
Sample-Id: 00001438-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Makes it easier to recreate an AVCodecContext for ATRAC3+ decoding,
which is needed in multimedia frameworks, as well as in general cases
where demuxing and decoding are separate entities.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
- The memcpy was completely wrong because
s->prob_ctx[s->framectxid].coef is a [4][2][2][6][6][3] array, whereas
s->prob.coef is a [4][2][2][6][6][11] array.
- The additional check was committed to ffmpeg by Ronald S. Bultje.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This handles macroblock edges for the chroma components in the same way
as for the luma compoment for 4:4:4 streams. The Spec explicitly states
that the deblocking filter is not applied to edges at the boundary of
the picture.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The default get_buffer2() implementation (and possibly some
user ones) does not allocate edges when this flag is set, which may
expose bugs in some decoders. Until the 10 release is out, it is safer
to remove this part.
The T-REC-H.265-2013044 page 79 states it has to be in the range
[-s->sps->qp_bd_offset, 51].
Sample-Id: 00001386-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
The src buffer should only contain values in the interval
[0, (1 << BIT_DEPTH) - 1].
Since shift = (BIT_DEPTH - 5), src[x] >> shift must be in
the interval [0, 31], so no clip is needed.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The reconstructed picture should always be clipped (see section 8.6.5),
previously we did not clip coding units where
cu_transquant_bypass_flag == 1.
Sample-Id: 00001325-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
And use unsigned datatypes.
Otherwise it would overflow.
Sample-Id: 00001315-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org