James Almer
d950279cbf
avcodec/ttadsp: cosmetics
...
Clean some header includes and use the same naming scheme as
in ttaencdsp
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
efc9d5c4bc
x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Ronald S. Bultje
a4edaa0270
vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters.
...
Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
9 years ago
Ronald S. Bultje
7ca422bb1b
vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters.
...
Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
9 years ago
Ronald S. Bultje
726501a34e
vp9: add 32x32 idct AVX2 implementation.
...
About 1.8x speedup compared to AVX version for full IDCT. Other
sub-IDCT scenarios also see speedups. Full --bench output for
idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):
nop: 16.5
vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
9 years ago
James Almer
7a15cf42ee
x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32
...
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Rostislav Pehlivanov
df1dc52195
diracdsp_init: add missing ARCH_X86_64 check
...
That SIMD is still x86_64 only for now.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
9 years ago
Rostislav Pehlivanov
bd61f3c6bf
diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped
...
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
9 years ago
Rostislav Pehlivanov
80721cc1ff
diracdsp: add dequantization SIMD
...
Currently unused, to be used in the following commits.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
9 years ago
Ronald S. Bultje
f0a2b6249b
vp9: add 16x16 idct avx2 (8-bit).
...
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:
nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
9 years ago
James Almer
645489cf90
x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32
...
About 10% faster.
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
293484fa5e
avcodec: add missing xmm/neon clobber test wrappers for the new decode API
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Matthieu Bouron
9eb3da2f99
asm: FF_-prefix internal macros used in inline assembly
...
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
9 years ago
Anton Khirnov
9df889a5f1
h264: rename h264.[ch] to h264dec.[ch]
...
This is more consistent with the naming of other decoders.
9 years ago
Martin Storsjö
f1a9eee41c
x86: Add missing movsxd for the int stride parameter
...
Signed-off-by: Martin Storsjö <martin@martin.st>
9 years ago
James Almer
ede4ec1f8f
x86/aacpsdsp: optimize add_squares loop
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
James Almer
82dbfccaf0
x86/aacdec: use HADDPS macro
...
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Diego Biurrun
1e9c5bf4c1
asm: FF_-prefix internal macros used in inline assembly
...
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
9 years ago
Diego Biurrun
dc40a70c57
Drop unnecessary libavutil/x86/asm.h #includes
9 years ago
Diego Biurrun
a6a750c7ef
tests: Move all test programs to a subdirectory
9 years ago
Christophe Gisquet
9630b3fc06
x86: lossless audio: SSE4 madd 32bits
...
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 68 -> 49 cycles
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Vittorio Giovara
41ed7ab45f
cosmetics: Fix spelling mistakes
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
9 years ago
Diego Biurrun
01621202aa
build: miscellaneous cosmetics
...
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
9 years ago
Michael Niedermayer
305344d89e
avcodec/fft: Add revtab32 for FFTs with more than 65536 samples
...
x86 optimizations are used only for the cases they support (<=65536 samples)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Michael Niedermayer
ae76b84221
avcodec: Extend fft to size 2^17
...
Asked-for-by: durandal_1707
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Diego Biurrun
1a094af638
fft: Split MDCT bits off from FFT
9 years ago
Timothy Gu
e3461197b1
x86/vc1dsp: Split the file into MC and loopfilter
9 years ago
Diego Biurrun
73ff983e8d
fft: x86: cosmetics: Drop silly comments, add comment, whitespace
9 years ago
Diego Biurrun
257b30af8e
x86: hevc: Fix linking with both yasm and optimizations disabled
...
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
9 years ago
James Almer
45d3af9059
x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Diego Biurrun
15a24614ae
build: Add vc1dsp component for more fine-grained dependencies
9 years ago
James Almer
70d685a77f
x86: use the new helper macros where useful
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Timothy Gu
bcc223523e
x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
9 years ago
Timothy Gu
59ebf32bca
huffyuvencdsp: Undefine "i" macro after each use
9 years ago
James Almer
8ae7447941
x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}
...
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Timothy Gu
9fd6ea933f
dirac_dwt: Make x86 files/functions names consistent
9 years ago
Timothy Gu
17ab8f7e68
diracdsp: Make x86 files/functions names consistent
9 years ago
Henrik Gramner
aa751573fe
avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc
...
Using rNm and x86inc's stack allocation with a negative value at the same
time isn't supported, and caused the original stack pointer to be clobbered
when using a compiler that doesn't support stack alignment.
9 years ago
James Darnley
7042a55c55
avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
...
2.6 times faster (366 vs. 142 cycles)
9 years ago
Timothy Gu
dd57b316c1
diracdsp_mmx: Fix some more indentations
9 years ago
Timothy Gu
f5e2b8de55
diracdsp_mmx: Fix indentation
9 years ago
Timothy Gu
838abfc1d7
x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format
9 years ago
Luca Barbato
e280fe1329
v210: Use separate sample_factors
...
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.
And while at it simplify a little the code.
9 years ago
James Darnley
15ec7aa417
v210: Add avx2 version of the 10-bit line encoder
...
Around 25% faster than the ssse3 version.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
9 years ago
James Darnley
d29237e557
v210: Add avx2 version of the 8-bit line encoder
...
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
9 years ago
Timothy Gu
180f9a0958
all: Make header guard names consistent
9 years ago
foo86
ae5b2c5250
avcodec/dca: add new decoder based on libdcadec
9 years ago
foo86
4608996772
avcodec/dca: remove old decoder
...
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
9 years ago
James Almer
c792528970
x86/imdct36: use extractps inside the STORE macro
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
9 years ago
Luca Barbato
eafb05fcf3
v210: x86: Add the correct guards around the asm code
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
9 years ago