These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.
Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is generally around 12% faster than the prior method of creating a
linked list for each block as tokens are read, but can be anywhere from
8% to 28% faster depending on file and CPU.
Originally committed as revision 22190 to svn://svn.ffmpeg.org/ffmpeg/trunk
This increases the slice size to 64 pixels, due to having to decode an
entire chroma superblock row per slice.
This can be up to 6% slower depending on clip and CPU, but is necessary
for future optimizations that gain significantly more than was lost.
Originally committed as revision 22189 to svn://svn.ffmpeg.org/ffmpeg/trunk
This doesn't really matter yet since 4:2:0 1080p has only 3060 superblocks,
but larger resolutions or 4:4:4 1080p could hit this case.
Originally committed as revision 21930 to svn://svn.ffmpeg.org/ffmpeg/trunk
Much faster for long runs (e.g. nearly uncoded frames), slightly faster
for the general case.
Originally committed as revision 21927 to svn://svn.ffmpeg.org/ffmpeg/trunk
Inspired by guidance from Dark Shikari. On a Core 2 Duo 2.0 GHz, this
change decodes the 10-minute Big Buck Bunny 1080p short about 2 seconds
faster.
Originally committed as revision 20895 to svn://svn.ffmpeg.org/ffmpeg/trunk
Faster checks in reverse_dc_prediction.
Simplified deblocking checks.
Check transform==15 first, since it's more common than 13.
Originally committed as revision 20747 to svn://svn.ffmpeg.org/ffmpeg/trunk
on their grouping, create one loop that indexes into a table of AC VLC
tables.
There is also a small optimization here: Do not call unpack_vlcs()
if there are no fragments in the list with outstanding coefficients.
My profiling indicates that this can save upwards of 1 million
dezicycles per frame throughout the course of unpack_dct_coeffs().
Originally committed as revision 20699 to svn://svn.ffmpeg.org/ffmpeg/trunk
outstanding coefficients yet to be decoded from the bitstream. Once a
fragment reaches end-of-block, remove it from this new list. This change
makes the VP3/Theora entropy decode process dramatically faster due to
not having to iterate incessantly over fragments which have already been
fully decoded.
Originally committed as revision 20698 to svn://svn.ffmpeg.org/ffmpeg/trunk
the DC coefficients. This has a greater probability of leveraging the
coefficients while they are still cached.
When testing with the Big Buck Bunny 1080p video, I consistently saw
improvements of 500k-600k dezicycles per run (through
reverse_dc_prediction()) thanks to this move.
Originally committed as revision 19966 to svn://svn.ffmpeg.org/ffmpeg/trunk