This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
AS libavcodec/arm/ac3dsp_armv6.o
ffmpeg-src/libavcodec/arm/ac3dsp_armv6.S: Assembler messages:
ffmpeg-src/libavcodec/arm/ac3dsp_armv6.S:40: Error: selected processor
does not support `movw r8,#0x1fe0'
make[1]: *** [libavcodec/arm/ac3dsp_armv6.o] Error 1
MOVW is ARMv7 way to load constant:
* movw, or move wide, will move a 16-bit constant into a register,
implicitly zeroing the top 16 bits of the target register.
* movt, or move top, will move a 16-bit constant into the top half
of a given register without altering the bottom 16 bits
To load 32 bit constant, movw lower16; movt upper16; is better than
ldr if available, because:
While this approach takes two instructions, it does not require any
extra space to store the constant so both the movw/movt method and the
ldr method will end up using the same amount of memory. Memory
bandwidth is precious in and the movw/movt approach avoids an extra
read on the data side, not to mention the read could have missed the
cache.
But here it is armv6 optimization, so that we have to use ldr.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The assembler emits literal pools too far from the load instructions,
so we must do it explicitly at a suitable location.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 8b454c352f)
The assembler emits literal pools too far from the load instructions,
so we must do it explicitly at a suitable location.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003)
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit c73d99e672)
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This moves the fields needed by asm near the top, before any
structs or other members which complicate the offset calculation.
Modifying other structs will no longer require updating the offsets,
and the asm code is slightly simpler due to the smaller offsets.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit d461a47317)
This moves the fields needed by asm near the top, before any
structs or other members which complicate the offset calculation.
Modifying other structs will no longer require updating the offsets,
and the asm code is slightly simpler due to the smaller offsets.
Signed-off-by: Mans Rullgard <mans@mansr.com>