Jason Garrett-Glaser
1e73967950
VP8: partially inline decode_block_coeffs
...
Avoids a function call in the case of empty DCT blocks (most of the time).
Originally committed as revision 24691 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
ffbf0794f9
Fix 100L in r24689
...
Accidentally committed some timing code.
Originally committed as revision 24690 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
afb54a85c3
VP8: simplify decode_block_coeffs to avoid having to track nonzero coeffs
...
Slightly faster.
Originally committed as revision 24689 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b0d5879513
VP8: slightly faster DCT coefficient probability update
...
Originally committed as revision 24687 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
476be414a4
VP8: make another RAC call branchy
...
1-2 clocks faster.
Originally committed as revision 24683 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
0908f1b945
VP8: unroll partition type decoding tree
...
~34% faster partition type decoding.
Originally committed as revision 24681 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c5dec7f137
VP8: unroll splitmv decoding tree
...
Much faster splitmv mode decoding.
Originally committed as revision 24680 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
23117d69c1
VP8: unroll MB mode decoding tree
...
~50% faster MB mode decoding, plus eliminate a costly switch.
Originally committed as revision 24679 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
370b622a45
VP8: eliminate a dereference in coefficient decoding
...
Originally committed as revision 24671 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
f311208cf1
VP8: much faster DC transform handling
...
A lot of the time the DC block is empty: don't do the WHT in this case.
A lot of the rest of the time, there's only one coefficient: make a special
DC-only transform for that case.
When the block is empty, don't incorrectly mark luma DCT blocks as having DC
coefficients.
Originally committed as revision 24670 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
827d43bb9d
VP8: move zeroing of luma DC block into the WHT
...
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.
Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Pascal Massimino
d2840fa49c
only store intra prediction modes on the boundary for keyframes, not as a plane.
...
inter-frame behaviour unchanged.
Originally committed as revision 24664 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
10bf2eebbe
VP8: simplify token_prob handling
...
~1.5% faster decode_block_coeffs
Originally committed as revision 24659 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Pascal Massimino
c22b4468a6
prevent access to vp8_coeff_band[16]
...
Originally committed as revision 24656 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Pascal Massimino
a8ab0cccf7
b0rk3d FATE + black helicopters hissing -> rolling back to r24556 and sleeping
...
Originally committed as revision 24559 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Pascal Massimino
62d1f7864e
perform the clipping on luma_dc_qmul[1] and chroma_qmul[0] earlier
...
Originally committed as revision 24558 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Pascal Massimino
e7e81959d6
save some copies by moving some fields out of proba[2]
...
Originally committed as revision 24557 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
fca05ea8a0
VP8: add missing free
...
Fixes a tiny memory leak.
Originally committed as revision 24504 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Carl Eugen Hoyos
28e241de5d
Fix r24445: Instead of needlessly initialising a variable, silence the warning.
...
Originally committed as revision 24498 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
ca18a478e3
VP8: Inline traversing vp8_small_mvtree
...
Much faster read_mv_component, slightly faster overall
Originally committed as revision 24470 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
7697cdcf95
VP8: Use vp56_rac_get_prob_branchy when the bit is only used by an if()
...
Originally committed as revision 24469 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
fe1b5d974a
Decode DCT tokens by branching to a different code path for each branch
...
on the huffman tree, instead of traversing the tree in a while loop.
Based on the similar optimization in libvpx's detokenize.c
10% faster at normal bitrates, and 30% faster for high-bitrate intra-only
Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
13a1304bb3
Add myself to VP8 copyright and maintainers.
...
Also add Ronald to maintainers.
Originally committed as revision 24464 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
414ac27d8f
VP8: always_inline some things to force gcc to do the right thing
...
Mostly seems to help in the MC code, which gets a hundred cycles faster.
Originally committed as revision 24463 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
06d50ca804
VP8: use AV_RL24 instead of defining a new RL24.
...
Originally committed as revision 24462 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
9fddd14a8e
VP8: Slightly faster MV selection
...
Don't clamp best mv unless it's actually used.
Originally committed as revision 24461 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
14767f35ed
VP8: use AV_ZERO32 instead of AV_WN32A where relevant
...
Originally committed as revision 24460 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
09959ec46e
VP8: eliminate redundant code in r24458
...
Originally committed as revision 24459 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
a71abb714e
VP8: shave a few clocks off check_intra_pred_mode
...
Originally committed as revision 24458 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
0087aa47d0
VP8: fix broken sign bias code in MV pred
...
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.
Originally committed as revision 24456 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
3ae079a3c8
VP8: optimize DC-only chroma case in the same way as luma.
...
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.
Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
3df56f4118
VP8: Clean up some variable shadowing.
...
Originally committed as revision 24454 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
8a467b2d44
VP8: 30% faster idct_mb
...
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
ef38842f0b
VP8: smarter prefetching
...
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx.
Originally committed as revision 24451 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c25c776708
VP8: clear DCT blocks in iDCT instead of using clear_blocks.
...
~0.3% faster overall.
Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b74f70d646
VP8: avoid a memset for non-i4x4 blocks with no coefficients
...
Originally committed as revision 24447 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
145d31865d
Get rid of more unnecessary dereferences in VP8 deblocking
...
Originally committed as revision 24446 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
867215336d
Shut up an uninitialized variable GCC warning in VP8.
...
Originally committed as revision 24445 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c4211046d2
Smarter VP8 prefetching
...
Prefetch all refs (including altref), but only if they've been used so far this
frame.
~2.5% faster overall.
TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch
it.
Originally committed as revision 24444 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
8cfae560ad
Fix stupid bug in VP8 prefetching code
...
Originally committed as revision 24443 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
2a38c2e99a
Eliminate a LUT in escape decoding in VP8 decode_block_coeffs
...
Originally committed as revision 24441 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
d292c3455e
Eliminate some repeated dereferences in VP8 inter_predict
...
Originally committed as revision 24438 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b946111fde
Eliminate a pointless memset for intra blocks in P-frames in VP8
...
Originally committed as revision 24429 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b9a7186bf4
VP8: Don't store segment in macroblock struct anymore.
...
Not necessary with the previous patch.
Originally committed as revision 24427 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c55e0d34ba
Convert VP8 macroblock structures to a ring buffer.
...
Uses a slightly nonintuitive ring buffer size of (width+height*2) to simplify
addressing logic.
Also split out the segmentation map to a separate structure, necessary to
implement the ring buffer.
Originally committed as revision 24426 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
968570d65f
Calculate deblock strength per-MB instead of per-row
...
Gives better cache locality, since the VP8Macroblock structs are still in cache.
Inspired by the way x264 does it.
Originally committed as revision 24417 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
d1c58fce20
Avoid tracking i4x4 modes in P-frames in VP8
...
As in the previous commit, they aren't used for context selection, so it saves
memory this way.
Originally committed as revision 24416 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
158e062c95
Avoid useless fill_rectangle in P-frames in VP8
...
In VP8, i4x4 only uses contexts based on neighbors in I-frames.
Originally committed as revision 24415 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
7bf254c41d
Optimize partition mv decoding in VP8
...
Originally committed as revision 24414 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c0498b3031
Take shortcuts for mv0 case in VP8 MC
...
Avoid edge emulation -- it isn't needed if there isn't any subpel.
Originally committed as revision 24413 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago