~200 bytes smaller ff_h264_filter_mb()
please everyone, NEVER add code with the assumtation that gcc will remove it
without checking gcc actually does. Chances are it does not.
Originally committed as revision 21251 to svn://svn.ffmpeg.org/ffmpeg/trunk
and 5% faster.
ff_h264_filter_mb_fast() stay the same size as gcc decided not to inline these
functions there in the first place.
Originally committed as revision 21250 to svn://svn.ffmpeg.org/ffmpeg/trunk
No benchmark because its just replacing variables with litteral constants
(so no risk for slowdown outside gcc silliness) and i need sleep.
Originally committed as revision 21237 to svn://svn.ffmpeg.org/ffmpeg/trunk
Using the low-level macros directly avoids redundant open/update/close
cycles.
2-3% faster on ARM, PPC, and Core i7.
Originally committed as revision 21224 to svn://svn.ffmpeg.org/ffmpeg/trunk
This could have caused the linking failure of pred_pskip_motion() missing if
a compiler included never used static functions.
Originally committed as revision 21221 to svn://svn.ffmpeg.org/ffmpeg/trunk
About 1% faster ff_ac3_bit_alloc_calc_psd on Intel Atom, overall speedup
not measurable though.
Should have a bigger effect on systems without cmov or with very slow cmov.
Originally committed as revision 21214 to svn://svn.ffmpeg.org/ffmpeg/trunk
Since BGR24 is decoded as BGR32, fill its alpha channel with 255
using the appropriate predictors.
Originally committed as revision 21211 to svn://svn.ffmpeg.org/ffmpeg/trunk
Simplify cur_band_type, group_len, and coef/offset calculations. This
makes the code easier to read and slightly faster.
Originally committed as revision 21189 to svn://svn.ffmpeg.org/ffmpeg/trunk
The codebooks each consist of small number of values repeated in
groups of 2 or 4. Storing the codebooks as a packed list of 2- or
4-bit indexes into a table reduces their size substantially (from 7.5k
to 1.5k), resulting in less cache pressure.
For the band types with sign bits in the bitstream, storing the number
and position of non-zero codebook values using a few bits avoids
multiple get_bits() calls and floating-point comparisons which gcc
handles miserably.
Some float/int type punning also avoids gcc brain damage.
Overall speedup 20-35% on Cortex-A8, 20% on Core i7.
Originally committed as revision 21188 to svn://svn.ffmpeg.org/ffmpeg/trunk
Two of these are in fact constant size, so use the constant instead of
a variable in the declarations. The remaining one is small enough
that always using the maximum size is acceptable.
Originally committed as revision 21183 to svn://svn.ffmpeg.org/ffmpeg/trunk
Seems to speed the code up a little...
The placement of many generic functions between h264.c and h264.h is still open
Currently they are a little randomly placed between them.
Originally committed as revision 21178 to svn://svn.ffmpeg.org/ffmpeg/trunk
called once per MB in worst case and doesnt seem to benefit from static inline.
Actually the code might be a hair faster now (0.1% according to my benchmark but
this could be random noise)
Originally committed as revision 21173 to svn://svn.ffmpeg.org/ffmpeg/trunk
no speedloss meassured, also its really not touching anything that is speed relevant.
Originally committed as revision 21169 to svn://svn.ffmpeg.org/ffmpeg/trunk
No speedloss meassured (its slightly faster here but that may be random fluctuations)
Originally committed as revision 21165 to svn://svn.ffmpeg.org/ffmpeg/trunk