Michael Niedermayer
73fb40dc87
avcodec/x86/idctdsp: Remove duplicate include
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
8 years ago
James Almer
ac42f08099
x86/hevc_add_res: merge missing changes from 3d65359832
...
Unrolling the loops triplicates the size of the assembled output
while not generating any gain in performance.
8 years ago
Clément Bœsch
40ac226014
lavc/x86/hevc: rename hevc_res_add to hevc_add_res
...
This will simplify incoming merge.
8 years ago
James Almer
30cadfe071
avcodec/lossless_videodsp: use ptrdiff_t for length parameters
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Clément Bœsch
af607b7e07
lavc/huffyuvdsp: only transmit the pix_fmt instead of the whole avctx
...
Only the pixel format is required in that init function. This will also
simplify the incoming merge.
8 years ago
James Almer
aee046a895
x86/audiodsp: remove an unnecessary movss
8 years ago
Ilia
2f3d10a01a
avcodec/vp9: avx2 implementation of ipred_dl_16x16_16
...
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5
Benchmarked with 10000 runs
Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
8 years ago
Mirage Abeysekara
5eb4f95bef
h264pred: added AVX2 implementation for tm_vp8 16x16.
...
checkasm --bench results with 5000 runs
pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
8 years ago
Michael Niedermayer
835d9f299c
avcodec/x86/cavsdsp: Put MMX code under mmx check
...
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
8 years ago
James Darnley
33de0fee2c
avcodec/h264: enable sse2 chroma deblock/loop filter functions
...
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
8 years ago
James Darnley
cd893b9307
avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
...
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
8 years ago
James Darnley
0e16b3e2be
avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
...
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
8 years ago
James Darnley
987ffe4b8d
avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
...
~1.14x faster (90 vs 78 cycles) compared with mmxext
8 years ago
James Darnley
88307b3eec
avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
...
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
8 years ago
James Darnley
ac096fc82d
avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
...
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
8 years ago
James Darnley
5c56758843
avcodec/h264: add avx 8-bit chroma v deblock/loop filter
...
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
8 years ago
James Darnley
5336887867
avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
...
x86-64 only
Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)
Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)
Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx: ~3.29x (370 vs. 112 cycles)
8 years ago
James Darnley
e18bc2114f
avcodec/h264: add named parameters to x86 function
8 years ago
James Darnley
9d815b7424
avcodec/x86: deduplicate PASS8ROWS macro
8 years ago
James Almer
c8467abbad
x86/rv34dsp: add ff_rv34_idct_dc_add_sse2
...
Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
ab5c4d006d
x86/vp8dsp: add ff_vp8_idct_dc_add_sse2
...
Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Michael Niedermayer
536ac72f46
Revert "Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'"
...
The assumption this is based on is wrong, the code is not always run with bitexact flags
This reverts commit a956164e1e
, reversing
changes made to f6005907fd
.
Approved-by: James Almer <jamrial@gmail.com>
8 years ago
Clément Bœsch
7c300a8ed4
lavc/hevc: remove a few random spaces to reduce diff with libav
8 years ago
James Almer
6d4c9f2ade
lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
47f212329e
huffyuvdsp: move functions only used by huffyuv from lossless_videodsp
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
cf9ef83960
huffyuvencdsp: move shared functions to a new lossless_videoencdsp context
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
30c1f27299
huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
5ac1dd8e23
lossless_videodsp: move shared functions from huffyuvdsp
...
Several codecs other than huffyuv use them.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Michael Niedermayer
aa95292043
avcodec/x86/vc1dsp_mc: Fix build with NASM 2.09.10
...
make fate passes
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
8 years ago
John Comeau
d06518752b
avcodec/x86/imdct36: fix building with nasm 2.11.05
...
fixes `operation size not specified` errors as described here:
http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2
I rebuilt again with yasm and made sure it didn't break that.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
8 years ago
Paul B Mahol
6d09d6edbc
avcodec/magicyuv: add 10 bit support
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
8 years ago
James Darnley
acdd2d805d
avcodec/h264: resolve assert being triggered when stack is not aligned
...
32-bit msvc.
8 years ago
James Darnley
728651df06
avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter
...
Yorkfield:
- mmx2: 2.53x (504 vs. 199 cycles)
- sse2: 3.83x (504 vs. 131 cycles)
Nehalem:
- mmx2: 2.42x (365 vs. 151 cycles)
- sse2: 3.56x (365 vs. 103 cycles)
Skylake:
- mmx2: 1.81x (308 vs. 170 cycles)
- sse2: 2.84x (308 vs. 108 cycles)
- avx: 2.93x (308 vs. 105 cycles)
8 years ago
James Darnley
add21d0bb3
avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
...
Yorkfield:
- mmx2: 2.45x (279 vs. 114 cycles)
- sse2: 3.36x (279 vs. 83 cycles)
Nehalem:
- mmx2: 2.10x (192 vs. 92 cycles)
- sse2: 2.84x (192 vs. 68 cycles)
Skylake:
- mmx2: 1.75x (170 vs. 97 cycles)
- sse2: 2.47x (170 vs. 69 cycles)
- avx: 2.47x (170 vs. 69 cycles)
8 years ago
James Darnley
58ca2ef62e
whitespace changes after last commit
8 years ago
James Darnley
f33714a694
avcodec/h264: clean up and expand x86 function definitions
8 years ago
James Darnley
13d71c28cc
avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
...
Yorkfield:
- sse2:
- complex: 4.13x faster (1514 vs. 367 cycles)
- simple: 4.38x faster (1836 vs. 419 cycles)
Skylake:
- sse2:
- complex: 3.61x faster ( 936 vs. 260 cycles)
- simple: 3.97x faster (1126 vs. 284 cycles)
- avx (versus sse2):
- complex: 1.07x faster (260 vs. 244 cycles)
- simple: 1.03x faster (284 vs. 274 cycles)
8 years ago
James Darnley
1dae7ffa0b
avcodec/h264: mmx 4:2:2 idct add8 function
...
2.87 times faster (1830 vs. 638 cycles)
8 years ago
James Darnley
815ea8c6cc
avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
...
2.1 times faster (401 vs. 194 cycles)
8 years ago
James Almer
2de1c79b61
x86/vp9itxfm: add missing AVX2 guards
...
Fixes compilation with Yasm 1.1.0 and older.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Ronald S. Bultje
83a139e3d8
vp9: add avx2 iadst16 implementations.
...
Also a small cosmetic change to the avx2 idct16 version to make it
explicit that one of the arguments to the write-out macros is unused
for >=avx2 (it uses pmovzxbw instead of punpcklbw).
8 years ago
Pierre Edouard Lepere
6d5636ad9a
hevc: x86: Add add_residual() SIMD optimizations
...
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
8 years ago
Andreas Cadhalpun
c8a6eb58d7
doc: fix spelling errors
...
Thanks to Mathieu Malaterre <malat@debian.org> for reporting the
Que/Queue typo. (https://bugs.debian.org/839542 )
Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
8 years ago
Diego Biurrun
788544ff0e
audiodsp: x86: Remove pointless header file
...
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
8 years ago
Diego Biurrun
b89804da9b
x86: videodsp: Add parentheses to expression to work around warning
...
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
8 years ago
Rostislav Pehlivanov
d2ae5f77c6
aacenc: add SIMD optimizations for abs_pow34 and quantization
...
Performance improvements:
quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs, 155 skips
without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips
Around 42% for the function
Twoloop coder:
abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder
Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder
Fast coder:
abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder
Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
8 years ago
Diego Biurrun
6be7944ee2
x86: Add missing colons after assembly labels
...
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
8 years ago
Alexandra Hájková
112cee0241
hevc: Add SSE2 and AVX IDCT
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Anton Khirnov
e4128c08d7
Revert "hevc: x86: Refactor IDCT macro declarations"
...
This reverts commit d9dccc0389
. There were
outstanding objections to this commit.
8 years ago
Diego Biurrun
5801f9ed24
h264_intrapred: x86: Update comments left behind in 95c89da36e
8 years ago