loop filter. This removes one obstacle of getting ff_h264_filter_mb_fast()
bitexact. code is maybe 0.1% faster
Originally committed as revision 21280 to svn://svn.ffmpeg.org/ffmpeg/trunk
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)
This change also allows some optimizations to be tried that would not have
been possible before.
Originally committed as revision 21270 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes build with --disable-encoders --enable-encoder=snow.
This fixes MPlayer build with --disable-mencoder.
Originally committed as revision 21259 to svn://svn.ffmpeg.org/ffmpeg/trunk
~200 bytes smaller ff_h264_filter_mb()
please everyone, NEVER add code with the assumtation that gcc will remove it
without checking gcc actually does. Chances are it does not.
Originally committed as revision 21251 to svn://svn.ffmpeg.org/ffmpeg/trunk
and 5% faster.
ff_h264_filter_mb_fast() stay the same size as gcc decided not to inline these
functions there in the first place.
Originally committed as revision 21250 to svn://svn.ffmpeg.org/ffmpeg/trunk
No benchmark because its just replacing variables with litteral constants
(so no risk for slowdown outside gcc silliness) and i need sleep.
Originally committed as revision 21237 to svn://svn.ffmpeg.org/ffmpeg/trunk
Using the low-level macros directly avoids redundant open/update/close
cycles.
2-3% faster on ARM, PPC, and Core i7.
Originally committed as revision 21224 to svn://svn.ffmpeg.org/ffmpeg/trunk
This could have caused the linking failure of pred_pskip_motion() missing if
a compiler included never used static functions.
Originally committed as revision 21221 to svn://svn.ffmpeg.org/ffmpeg/trunk
About 1% faster ff_ac3_bit_alloc_calc_psd on Intel Atom, overall speedup
not measurable though.
Should have a bigger effect on systems without cmov or with very slow cmov.
Originally committed as revision 21214 to svn://svn.ffmpeg.org/ffmpeg/trunk
Since BGR24 is decoded as BGR32, fill its alpha channel with 255
using the appropriate predictors.
Originally committed as revision 21211 to svn://svn.ffmpeg.org/ffmpeg/trunk
Simplify cur_band_type, group_len, and coef/offset calculations. This
makes the code easier to read and slightly faster.
Originally committed as revision 21189 to svn://svn.ffmpeg.org/ffmpeg/trunk
The codebooks each consist of small number of values repeated in
groups of 2 or 4. Storing the codebooks as a packed list of 2- or
4-bit indexes into a table reduces their size substantially (from 7.5k
to 1.5k), resulting in less cache pressure.
For the band types with sign bits in the bitstream, storing the number
and position of non-zero codebook values using a few bits avoids
multiple get_bits() calls and floating-point comparisons which gcc
handles miserably.
Some float/int type punning also avoids gcc brain damage.
Overall speedup 20-35% on Cortex-A8, 20% on Core i7.
Originally committed as revision 21188 to svn://svn.ffmpeg.org/ffmpeg/trunk