future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...
Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.
Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
Reduce scalefactors in non-zero bands that underflow by twice as much as those
in bands that just fail to hit psy targets.
Originally committed as revision 24482 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.
Patch by Eli Friedman <eli.friedman at gmail>
Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
on the huffman tree, instead of traversing the tree in a while loop.
Based on the similar optimization in libvpx's detokenize.c
10% faster at normal bitrates, and 30% faster for high-bitrate intra-only
Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
No difference at the moment, but allows a future branchy variant
of vp56_rac_get_prob to be significantly faster
Originally committed as revision 24467 to svn://svn.ffmpeg.org/ffmpeg/trunk
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.
Originally committed as revision 24456 to svn://svn.ffmpeg.org/ffmpeg/trunk
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx.
Originally committed as revision 24451 to svn://svn.ffmpeg.org/ffmpeg/trunk
Prefetch all refs (including altref), but only if they've been used so far this
frame.
~2.5% faster overall.
TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch
it.
Originally committed as revision 24444 to svn://svn.ffmpeg.org/ffmpeg/trunk
Uses a slightly nonintuitive ring buffer size of (width+height*2) to simplify
addressing logic.
Also split out the segmentation map to a separate structure, necessary to
implement the ring buffer.
Originally committed as revision 24426 to svn://svn.ffmpeg.org/ffmpeg/trunk