AVCodecContext.bits_per_raw_sample is updated from the previous thread
in the generic update function before the codec specific update_thread
function is called. The check for reinitialization of dsp functions uses
bits_per_raw_sample. When called from update_thread_context it will be
already at the current value and the dsp functions aren't updated if
only the bit depth changes.
This ensures the hwaccel privdata does not leak when a frame buffer could
not be allocated (and toggle the assert when the frame is re-used).
Having no frame buffer available is quite common when using the DXVA2
hwaccel in situations where the DXVA2 renderer is being re-allocated, for
example when moving between displays.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
dct_bits is never set except in h264, where it is never used, thus
remove it. Then, remove all functions that were set based on non-zero
(32) values for dct_bits. Lastly, merge 9-14 bpp functions for get_pixels
and draw_edge, which only care about pixel storage unit size, not actual
bits used (i.e. they don't clip).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This makes the decoder independent of mpegvideo.
This copy of the draw_horiz_band code is simplified compared to
the "generic" mpegvideo one which still has a number of special
cases for different codecs.
Signed-off-by: Martin Storsjö <martin@martin.st>
Not all hwaccels implement all codecs, so using one single list for
multiple such codecs means some codecs will be represented in the list,
even though they don't actually handle that codec. Copying specific
lists in each codec fixes that.
Signed-off-by: Martin Storsjö <martin@martin.st>
The decoder assumes a single bit depth for all the planes
while the specification allows different bit depths for luma
and chroma.
Avoid the possible problems described in CVE-2013-2277
CC: libav-stable@libav.org
The code is located in mpegvideo, and it's likely that in a minimal
config, we don't want to include debug info anyway.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is a regression introduced from the h264/mpegvideo split
Fixes out of array reads
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Not all hwaccels implement all codecs, so using one single list for
multiple such codecs means some codecs will be represented in the list,
even though they don't actually handle that codec. Copying specific
lists in each codec fixes that.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, keep them in the bitstream buffer until we read them verbatim,
this saves a memcpy() and a subsequent clearing of the target buffer.
decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from
6121.4 to 6095.5 cycles, i.e. 26 cycles faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>