Avoid searching for the lowest bulk cost for each pixel that isn't a repeat/skip. Instead store the lowest cost as we go along each pixel, and use it as needed.
Signed-off-by: Malcolm Bechard <malcolm.bechard@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also read data size for raw compressions too and
make sure its value is sane.
Remove code that fills missing blocks with zeroes.
It is marginally useful and make implementation
of actually useful features harder.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
This is a regression introduced from the h264/mpegvideo split
Fixes out of array reads
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes crashes in chromium on win64 on machines with AVX
(crashes that apparently aren't triggered by fate).
Signed-off-by: Martin Storsjö <martin@martin.st>
change the treatment of the strip y coordinates which previously did
not follow the description (nor did it behave like the binary decoder
on files with absolute strip offsets).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The new code is also faster and more robust.
As for the performance:
old decoder + conversion to rgb: fps = 2618
old decoder, without converting to rgb: fps = 4012
new decoder, producing rgb: fps = 4502
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This gets rid of a number of warnings about casts discarding
qualifiers from the pointer target, present since 7ebfb466a.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, keep them in the bitstream buffer until we read them verbatim,
this saves a memcpy() and a subsequent clearing of the target buffer.
decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from
6121.4 to 6095.5 cycles, i.e. 26 cycles faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows more transparent mixing of get_bits and whole-byte access
without having to touch get_bits internals.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-alpha and alpha-Y planes are cleared in the idct_put/add()
calls. For the alpha U/V planes, we only care about the DC for entropy
context prediction purposes, the rest of the data is unused.
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The residual block data of 16x16 blocks was ignored for b-frames, which
leads to easy-to-identify artifacts. After this patch, the artifacts are
gone. Sample video: svq3_watermark.mov. (Fate results unaffected.)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Reference:
commit 3615e2be84
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 22:02:57 2003 +0000
h263_h_loop_filter_mmx
Originally committed as revision 2553 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 359f98ded9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 20:28:10 2003 +0000
h263_v_loop_filter_mmx
Originally committed as revision 2552 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>