With these changes, gcc 4.5 and later recognise it as a bswap
and use the proper instructions on ARM and x86. On x86, the
16-bit bswap is recognised from gcc 4.1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The volatile qualifiers are not needed on these statements as
their effects are fully specified by constraints.
Signed-off-by: Mans Rullgard <mans@mansr.com>
On some versions of gcc, these weren't always getting inlined due to hitting
the inline cap limit in some files. This is generally bad, as most of these
functions are smaller inlined than not.
(cherry picked from commit eb3755a5aa)
On some versions of gcc, these weren't always getting inlined due to hitting
the inline cap limit in some files. This is generally bad, as most of these
functions are smaller inlined than not.
This prevents gcc inserting useless UXTH instructions, at least
in some cases.
Originally committed as revision 25212 to svn://svn.ffmpeg.org/ffmpeg/trunk
Instead of defining functions in per-arch header files included
by the main cpu.c, define them normally and call them from the
generic one.
Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk
This reduces the number of false dependencies on header files and
speeds up compilation.
Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk
ARMv6 and later support unaligned loads and stores for single
word/halfword but not double/multiple. GCC is ignorant of this and
will always use bytewise accesses for unaligned data. Casting to an
int32_t pointer is dangerous since a load/store double or multiple
instruction might be used (this happens with some code in FFmpeg).
Implementing the AV_[RW]* macros with inline asm using only supported
instructions gives fast and safe unaligned accesses. ARM RVCT does
the right thing with generic code.
This gives an overall speedup of up to 10%.
Originally committed as revision 18601 to svn://svn.ffmpeg.org/ffmpeg/trunk