Anton Khirnov
683da86aab
audiodsp: reorder arguments for vector_clipf
...
This will make the x86 asm simpler.
ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau
<janne-libav@jannau.net>
9 years ago
Anton Khirnov
75d98e30af
audiodsp/x86: clear the high bits of the order parameter on 64bit
...
Also change shl to add, since it can be faster on some CPUs.
CC: libav-stable@libav.org
9 years ago
Anton Khirnov
1d6c76e11f
audiodsp/x86: fix ff_vector_clip_int32_sse2
...
This version, which is the only one doing two processing cycles per loop
iteration, computes the load/store indices incorrectly for the second
cycle.
CC: libav-stable@libav.org
9 years ago
Diego Biurrun
de452e5037
pixblockdsp: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "stride" everywhere.
9 years ago
Diego Biurrun
721d57e608
vp56: Separate VP5 and VP6 dsp initialization
...
VP5 has no arch-specific optimizations (nor will it get some in the
future), so it makes no sense to try to share dsp init code with VP6.
9 years ago
Diego Biurrun
3fd22538bc
prores: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "linesize" everywhere.
9 years ago
Diego Biurrun
f81be06cf6
cavs: Change type of stride parameters to ptrdiff_t
...
ptrdiff_t is the correct type for array strides and similar.
9 years ago
Diego Biurrun
802727b538
vp8: Update some assembly comments left unchanged in bd66f073fe
9 years ago
Diego Biurrun
d9d26a3674
vp56: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
9 years ago
Diego Biurrun
6892df9294
vp3: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "stride" everywhere.
9 years ago
Diego Biurrun
e2b9993558
simple_idct: x86: Drop disabled IDCT implementation
...
This gem has been disabled since 2001.
9 years ago
James Almer
d950279cbf
avcodec/ttadsp: cosmetics
...
Clean some header includes and use the same naming scheme as
in ttaencdsp
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Ronald S. Bultje
9790b44a89
vp9mc/x86: sse2 MC assembly.
...
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
James Almer
67922b4ee4
vp9mc/x86: add AVX and AVX2 MC
...
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Clément Bœsch
3cda179f18
vp9mc/x86: rename ff_* to ff_vp9_*
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
James Almer
8be8444d01
vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
...
pavgb is an sse integer instruction, so the mmxext flag is enough
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Clément Bœsch
6ab642d69d
vp9mc/x86: simplify a few inits.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Ronald S. Bultje
3a09494939
vp9mc/x86: add 16px functions (64bit only).
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Anton Khirnov
89466de4ae
vp9/x86: rename vp9dsp to vp9mc
...
It only contains the MC SIMD, other SIMD will go into different files.
9 years ago
Christophe Gisquet
3c504bc359
x86: deduplicate some constants
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
James Almer
efc9d5c4bc
x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Ronald S. Bultje
a4edaa0270
vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters.
...
Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
9 years ago
Ronald S. Bultje
7ca422bb1b
vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters.
...
Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
9 years ago
Ronald S. Bultje
726501a34e
vp9: add 32x32 idct AVX2 implementation.
...
About 1.8x speedup compared to AVX version for full IDCT. Other
sub-IDCT scenarios also see speedups. Full --bench output for
idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):
nop: 16.5
vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
9 years ago
James Almer
7a15cf42ee
x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32
...
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Diego Biurrun
d06dfaa5cb
x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate
9 years ago
Diego Biurrun
4efab89332
x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate
9 years ago
Diego Biurrun
0a39c9ac0b
x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code
...
That code is only ever initialized with that flag set.
9 years ago
Diego Biurrun
95c1df929b
x86: hpeldsp: Drop unused function parameters
9 years ago
Diego Biurrun
c3e83ad3b7
x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate
9 years ago
Diego Biurrun
1dfc3cf89d
x86: hpeldsp: Split off VP3-specific bits into a separate file
9 years ago
James Almer
fca3c3b619
hevc: Add AVX2 DC IDCT
...
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
9 years ago
Rostislav Pehlivanov
df1dc52195
diracdsp_init: add missing ARCH_X86_64 check
...
That SIMD is still x86_64 only for now.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
9 years ago
Rostislav Pehlivanov
bd61f3c6bf
diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped
...
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
9 years ago
Rostislav Pehlivanov
80721cc1ff
diracdsp: add dequantization SIMD
...
Currently unused, to be used in the following commits.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
9 years ago
Ronald S. Bultje
f0a2b6249b
vp9: add 16x16 idct avx2 (8-bit).
...
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:
nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
9 years ago
James Almer
645489cf90
x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32
...
About 10% faster.
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
293484fa5e
avcodec: add missing xmm/neon clobber test wrappers for the new decode API
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Matthieu Bouron
9eb3da2f99
asm: FF_-prefix internal macros used in inline assembly
...
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
9 years ago
Clément Bœsch
4a081f224e
libavcodec: fix constness in clobber test avcodec_open2() wrappers
...
Signed-off-by: Martin Storsjö <martin@martin.st>
9 years ago
Anton Khirnov
9df889a5f1
h264: rename h264.[ch] to h264dec.[ch]
...
This is more consistent with the naming of other decoders.
9 years ago
Martin Storsjö
f1a9eee41c
x86: Add missing movsxd for the int stride parameter
...
Signed-off-by: Martin Storsjö <martin@martin.st>
9 years ago
James Almer
ede4ec1f8f
x86/aacpsdsp: optimize add_squares loop
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
82dbfccaf0
x86/aacdec: use HADDPS macro
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Diego Biurrun
1e9c5bf4c1
asm: FF_-prefix internal macros used in inline assembly
...
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
9 years ago
Diego Biurrun
dc40a70c57
Drop unnecessary libavutil/x86/asm.h #includes
9 years ago
Diego Biurrun
a6a750c7ef
tests: Move all test programs to a subdirectory
9 years ago
Christophe Gisquet
9630b3fc06
x86: lossless audio: SSE4 madd 32bits
...
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 68 -> 49 cycles
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Vittorio Giovara
41ed7ab45f
cosmetics: Fix spelling mistakes
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
9 years ago
Diego Biurrun
01621202aa
build: miscellaneous cosmetics
...
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
9 years ago