Change the size specifiers to match the actual element sizes
of the data. This makes no practical difference with strict
alignment checking disabled (the default) other than somewhat
documenting the code. With strict alignment checking on, it
avoids trapping the unaligned loads.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Its useful to support the official decoder for comparission and debugging.
This reverts commit f9def9ccc6.
Conflicts:
Changelog
configure
libavcodec/allcodecs.c
libavcodec/libvorbis.c
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions
Benchmark (rounded to tens of unit):
V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16
C 445 358 985 1785 1559 3280
MMX* 219 271 478 714 929 1443
SSE2 131 158 294 425 515 892
SSSE3 120 122 248 387 390 763
End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The vertically interpolating variants of these functions read
ahead one line to optimise the loop. On the last line processed,
this might be outside the buffer. Fix these invalid reads by
processing the last line outside the loop.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Using release_buffer and get_buffer as currently might
not prefer the previous frame contents which the
decoder relies on.
This leads to horrible playback in players using direct
rendering like MPlayer.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Unlike other variants, for YUY2 we need to use different prediction:
* on line 0 for luma we should left predict starting from the second pixel
* on line 1 we should left predict first 4 pixels for luma and 2 for chroma
* median prediction employed here is taken directly from HuffYUV
Also update libav->ffmpeg as theres pretty much no code left from libav.
The new code is faster, requires fewer mallocs and less memory. Its
also half the number of lines of code.
This code is not 100% identical in behavior to the previous, but the
differences appear to be rather limitations of the previous design
than intended though i could be wrong of course.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Prevents subsequent overreads when these numbers are used as indices
in arrays.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>