The shift parameter was removed from this interface in 7e1ce6a.
This updates the Altivec implementation to match.
Signed-off-by: Mans Rullgard <mans@mansr.com>
To load unaligned vector data in the usual way, explicit vec_ld()
should be used rather than dereferencing a pointer to a vector type.
When the VSX extension is enabled, gcc may compile vector pointer
dereferences using the VSX lxvw4x instruction instead of the lvx
instruction typically used with Altivec/VMX. As the behaviour of
these instructions with unaligned addresses differs, it is important
that only lvx is used here.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The new lowres support is limited to decoders where lowres decoding
is possible in high quality.
I was not able to measure any speed difference, but if one is found
the 2-3 lines that might affect speed can be made compile time conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
On 32-bit ppc, the GOT pointer must be loaded manually.
This adds a "get_got" assembler macro to compute the
GOT address. The "movrel" macro is updated to take an
additional parameter containing the GOT address since
no register is reserved for this purpose on ppc32.
These changes have no effect on ppc64 builds.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This reverts commit f8bed30d8b. The reason
for this is that the overlap filter, which runs after IDCT, should run
on unclamped values, and thus IDCT and put_pixels() cannot be merged if
we want to attempt to be bitexact.
According to ISO 9899:1999 S 6.5.7/4:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros. If E1 has an unsigned type, the value of the
result is E1× 2^E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1× 2^E2 is representable in the result type, then
that is the resulting value; otherwise, the behavior is undefined.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The 4-tap filters should only access one row/column before the
reference block.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit e0e46cae37)
GCC 4.3 and later are more particular about signedness matching
in vector operations. The operations under if(rangered) were
missing assignments and thus had no effect.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 381efba0ec)