Init code in that if statement goes down from 26716 cycles to 26047
cycles, i.e. the removal of the clear_blocks and smaller memcpy()
together save around 670 cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The existing checks are insufficient to detect a pixel format
changes in case of some damaged streams.
Fixes inconsistency and later out of array accesses
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Without any correctly decoded slices, there can be no frame.
Fixes out of array reads
Found-by: Rafaël Carré
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
ref_list is constructed from other fields per slice when needed, so do
not copy it for both frame and slice threading.
default_ref_list is constructed per frame and still needs to be copied
to per-slice contexts for slice threading, but a copy is not needed for
frame threading.
Fixes out of array reads
Regression probably since allowing pixel format changes or a related commit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
If the motion vector is at a subpixel position, we need 3 pixels below
the motion vector's wholepel position available, not 2, since the MC
filter is a sixtap filter for the hpel position, and then a bilin filter
for the qpel position.
This patch fixes highly irreproducible (0.1%) fate failures in frame 2
and 4 of h264-conformance-cama2_vtc_b (e.g. first P-frame, first field,
last line of MB x=40,y=2 and second field and last lines of MBs x=39-40,
y=3). These used pre-loopfilter instead of post-loopfilter data because
the await_progress() waited for one line too little in that field, and
the motion vector of these particular MBs happened to align exactly to a
position where that demonstrates the bug.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
If the motion vector is at a subpixel position, we need 3 pixels below
the motion vector's wholepel position available, not 2, since the MC
filter is a sixtap filter for the hpel position, and then a bilin filter
for the qpel position.
This patch fixes highly irreproducible (0.1%) fate failures in frame 2
and 4 of h264-conformance-cama2_vtc_b (e.g. first P-frame, first field,
last line of MB x=40,y=2 and second field and last lines of MBs x=39-40,
y=3). These used pre-loopfilter instead of post-loopfilter data because
the await_progress() waited for one line too little in that field, and
the motion vector of these particular MBs happened to align exactly to a
position where that demonstrates the bug.
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Clobbering these tables will temporarily clobber the template used
as a basis for other threads to start decoding from. If the other
decoding thread updates from the template right at that moment,
subsequent threads will get invalid (or, usually, none at all) mmco
tables. This leads to invalid reference lists and subsequent decode
failures.
Therefore, instead, decode the mmco tables only for the first slice in
a field or frame. For other slices, decode the bits and ensure they
are identical to the mmco tables in the first slice, but don't ever
clobber the context state. This prevents other threads from using a
clobbered/invalid template as starting point for decoding, and thus
fixes decoding in these cases.
This fixes occasional (~1%) failures of h264-conformance-mr1_bt_a with
frame-multithreading enabled.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Clobbering these tables will temporarily clobber the template used
as a basis for other threads to start decoding from. If the other
decoding thread updates from the template right at that moment,
subsequent threads will get invalid (or, usually, none at all) mmco
tables. This leads to invalid reference lists and subsequent decode
failures.
Therefore, instead, decode the mmco tables only for the first slice in
a field or frame. For other slices, decode the bits and ensure they
are identical to the mmco tables in the first slice, but don't ever
clobber the context state. This prevents other threads from using a
clobbered/invalid template as starting point for decoding, and thus
fixes decoding in these cases.
This fixes occasional (~1%) failures of h264-conformance-mr1_bt_a with
frame-multithreading enabled.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Fixes null pointer dereference later, since if this function failed,
a positive return value was returned to the caller.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Martin Storsjö <martin@martin.st>
Comparing AVCodecContext.pix_fmt against the get_pixel_format() return
value has the side effect of calling the get_format() callback on each
slice. Users of the callback will probably handle hardware accelerator
initialization in the callback.
Comparing AVCodecContext.pix_fmt against the get_pixel_format() return
value has the side effect of calling the get_format() callback on each
slice. Users of the callback will probably handle hardware accelerator
initialization in the callback.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>