Avoid searching for the lowest bulk cost for each pixel that isn't a repeat/skip. Instead store the lowest cost as we go along each pixel, and use it as needed.
Signed-off-by: Malcolm Bechard <malcolm.bechard@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also read data size for raw compressions too and
make sure its value is sane.
Remove code that fills missing blocks with zeroes.
It is marginally useful and make implementation
of actually useful features harder.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
This is a regression introduced from the h264/mpegvideo split
Fixes out of array reads
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes crashes in chromium on win64 on machines with AVX
(crashes that apparently aren't triggered by fate).
Signed-off-by: Martin Storsjö <martin@martin.st>
change the treatment of the strip y coordinates which previously did
not follow the description (nor did it behave like the binary decoder
on files with absolute strip offsets).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The new code is also faster and more robust.
As for the performance:
old decoder + conversion to rgb: fps = 2618
old decoder, without converting to rgb: fps = 4012
new decoder, producing rgb: fps = 4502
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This gets rid of a number of warnings about casts discarding
qualifiers from the pointer target, present since 7ebfb466a.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, keep them in the bitstream buffer until we read them verbatim,
this saves a memcpy() and a subsequent clearing of the target buffer.
decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from
6121.4 to 6095.5 cycles, i.e. 26 cycles faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows more transparent mixing of get_bits and whole-byte access
without having to touch get_bits internals.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-alpha and alpha-Y planes are cleared in the idct_put/add()
calls. For the alpha U/V planes, we only care about the DC for entropy
context prediction purposes, the rest of the data is unused.
Signed-off-by: Martin Storsjö <martin@martin.st>