Diego Biurrun
b89804da9b
x86: videodsp: Add parentheses to expression to work around warning
...
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
8 years ago
Rostislav Pehlivanov
d2ae5f77c6
aacenc: add SIMD optimizations for abs_pow34 and quantization
...
Performance improvements:
quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs, 155 skips
without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips
Around 42% for the function
Twoloop coder:
abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder
Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder
Fast coder:
abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder
Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
8 years ago
Diego Biurrun
6be7944ee2
x86: Add missing colons after assembly labels
...
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
8 years ago
Alexandra Hájková
112cee0241
hevc: Add SSE2 and AVX IDCT
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Anton Khirnov
e4128c08d7
Revert "hevc: x86: Refactor IDCT macro declarations"
...
This reverts commit d9dccc0389
. There were
outstanding objections to this commit.
8 years ago
Diego Biurrun
5801f9ed24
h264_intrapred: x86: Update comments left behind in 95c89da36e
8 years ago
Diego Biurrun
d9dccc0389
hevc: x86: Refactor IDCT macro declarations
8 years ago
Ronald S. Bultje
715f139c9b
vp9lpf/x86: make filter_16_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
8915320db9
vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
725a216481
vp9lpf/x86: make filter_44_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
5bfa96c4b3
vp9lpf/x86: make filter_16_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
b905e8d2fe
vp9lpf/x86: make filter_48/84_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
37637e6590
vp9lpf/x86: make filter_88_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
be10834bd9
vp9lpf/x86: make filter_44_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
7c62891efe
vp9lpf/x86: save one register in SIGN_ADD/SUB.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
c6375a83d1
vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
...
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
4ce8ba72f9
vp9lpf/x86: move variable assigned inside macro branch.
...
The value is not used outside the branch.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
e4961035b2
vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
683da2788e
vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
6e74e9636b
vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
6411c328a2
vp9lpf/x86: make cglobal statement more conservative in register allocation.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
a6e288d624
vp9lpf/x86: save one register in loopfilter surface coverage.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
0ed21bdc9e
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
f2e3d706a1
vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
James Almer
92d47550ea
vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
...
Similar gains as the ssse3 version once again
Additional improvements by Clément Bœsch <u@pkh.me>.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
6bea478158
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
James Almer
1f451eed60
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
...
Similar gains in performance as the SSSE3 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
a692724c58
vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
James Almer
42111e8543
avcodec: fix arguments on xmm/neon clobber test wrappers
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
449f263f9f
avcodec: add missing xmm/neon clobber test wrappers for the new encode API
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Justin Ruggles
b57e38f52c
ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
...
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.
Simplification and porting to current x86inc infrastructure by Diego Biurrun.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
8 years ago
Justin Ruggles
43717469f9
ac3dsp: Reverse matrix in/out order in downmix()
...
Also use (float **) instead of (float (*)[2]). This matches the matrix
layout in libavresample so we can reuse assembly code between the two.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
8 years ago
Hendrik Leppkes
8d1267932c
x86/h264_weight: use appropriate register size for weight parameters
...
This fixes decoding corruption on 64 bit windows.
Signed-off-by: Martin Storsjö <martin@martin.st>
8 years ago
Diego Biurrun
2caa93b813
mpegaudiodsp: Change type of array stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
8 years ago
Diego Biurrun
e4a94d8b36
h264chroma: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
8 years ago
Diego Biurrun
2ec9fa5ec6
idct: Change type of array stride parameters to ptrdiff_t
...
ptrdiff_t is the correct type for array strides and similar.
8 years ago
Diego Biurrun
009adfd4fb
x86: fpel: Remove unnecessary sign extend
8 years ago
Anton Khirnov
de2ae3c1fa
lavc: add clobber tests for the new encoding/decoding API
9 years ago
Hendrik Leppkes
5ae0ad001a
x86/h264_weight: use appropriate register size for weight parameters
...
Fixes trac 5579
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Acked-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Michael Niedermayer
bc26fe8927
avcodec/h264: Use ptrdiff_t for (bi)weight functions
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Anton Khirnov
12004a9a7f
audiodsp/x86: yasmify vector_clipf_sse
9 years ago
Anton Khirnov
eea9857bfd
blockdsp: drop the high_bit_depth parameter
...
It has no effect, since the code is supposed to operate the same way for
any bit depth.
9 years ago
Anton Khirnov
683da86aab
audiodsp: reorder arguments for vector_clipf
...
This will make the x86 asm simpler.
ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau
<janne-libav@jannau.net>
9 years ago
Anton Khirnov
75d98e30af
audiodsp/x86: clear the high bits of the order parameter on 64bit
...
Also change shl to add, since it can be faster on some CPUs.
CC: libav-stable@libav.org
9 years ago
Anton Khirnov
1d6c76e11f
audiodsp/x86: fix ff_vector_clip_int32_sse2
...
This version, which is the only one doing two processing cycles per loop
iteration, computes the load/store indices incorrectly for the second
cycle.
CC: libav-stable@libav.org
9 years ago
Diego Biurrun
de452e5037
pixblockdsp: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "stride" everywhere.
9 years ago
Diego Biurrun
721d57e608
vp56: Separate VP5 and VP6 dsp initialization
...
VP5 has no arch-specific optimizations (nor will it get some in the
future), so it makes no sense to try to share dsp init code with VP6.
9 years ago
Diego Biurrun
3fd22538bc
prores: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "linesize" everywhere.
9 years ago
Diego Biurrun
f81be06cf6
cavs: Change type of stride parameters to ptrdiff_t
...
ptrdiff_t is the correct type for array strides and similar.
9 years ago
Diego Biurrun
802727b538
vp8: Update some assembly comments left unchanged in bd66f073fe
9 years ago