James Darnley
2cba1825f7
avcodec/v210: add avx2 version of the 10-bit line encoder
...
Around 25% faster than the ssse3 version.
9 years ago
James Darnley
3836f404a8
avcodec/v210: add avx2 version of the 8-bit line encoder
...
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
9 years ago
Christophe Gisquet
74c414202f
x86: simple_idct10_template: use const
...
This avoid going through constants.c while still sharing them
with proresdsp.asm
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Ronald S. Bultje
6b579cf547
vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4.
9 years ago
Christophe Gisquet
e9a68b0316
x86: prores: templatize 10 bits simple_idct
...
This should be reused for a generic simple_idct10 function.
Requires a bit of trickery to declare common constants in C.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
Ronald S. Bultje
061b67fb50
vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction.
9 years ago
Ronald S. Bultje
26ece7a511
vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.
9 years ago
Ronald S. Bultje
db7786e8ff
vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.
9 years ago
Christophe Gisquet
ed450d4acf
x86: lavc: share more constant through defines
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
Christophe Gisquet
9dc45d1f42
x86: lavc: share more constants
...
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
James Almer
15574c505b
x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
...
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips
Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
a4d62f7775
x86/constants: fix alignment of pw_255
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
Ronald S. Bultje
bdc1e3e3b2
vp9/x86: intra prediction sse2/32bit support.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
James Almer
6b2caa321f
x86/vp9: add AVX and AVX2 MC
...
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
11 years ago
Christophe Gisquet
75837e9add
x86: sbrdsp/fft: reuse ps_neg constant
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
71db2d08b1
x86: better share ff_pw_2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
4e128ab0b1
x86: vpx/h264/hevc/mpeg2: share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
fc8db12a73
x86/vp9: inital AVX2 intra_pred
...
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz
1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips
3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips
1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips
2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips
3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips
1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
71469f3b63
x86: dsputil: Move constant declarations into separate header
12 years ago
Diego Biurrun
ed880050ed
x86: dsputil: Group all assembly constants together in constants.c
12 years ago
Diego Biurrun
3334cbec0a
x86: dsputil: Remove unused MOVQ_BONE macro
12 years ago
Ronald S. Bultje
b93b27edb0
dsputil: Make dsputil selectable
...
Signed-off-by: Martin Storsjö <martin@martin.st>
12 years ago
Ronald S. Bultje
6a701306db
dsputil: make selectable.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago