Semen Belozerov
3a7e9caf92
avcodec/vp9: ipred_hd_16x16_16 avx2 implementation
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
3 years ago
Semen Belozerov
e71d5156c8
avcodec/vp9: ipred_vl_16x16_16 avx2 implementation
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
3 years ago
Ilia Valiakhmetov
35a5d9715d
avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementation
...
vp9_diag_downright_32x32_12bpp_c: 429.7
vp9_diag_downright_32x32_12bpp_sse2: 158.9
vp9_diag_downright_32x32_12bpp_ssse3: 144.6
vp9_diag_downright_32x32_12bpp_avx: 141.0
vp9_diag_downright_32x32_12bpp_avx2: 73.8
Almost 50% faster than avx implementation
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
8 years ago
Ronald S. Bultje
d35ff98e27
vp9: fix overwrite in ff_vp9_ipred_dr_16x16_16_avx2.
...
Fixes trac issue 6459.
8 years ago
Ilia Valiakhmetov
81fc617c12
avcodec/vp9: ipred_dr_16x16_16 avx2 implementation
...
Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
8 years ago
Ilia Valiakhmetov
73d9a9a6af
libavcodec/vp9: ipred_dl_32x32_16 avx2 implementation
...
vp9_diag_downleft_32x32_8bpp_c: 580.2
vp9_diag_downleft_32x32_8bpp_sse2: 75.6
vp9_diag_downleft_32x32_8bpp_ssse3: 73.7
vp9_diag_downleft_32x32_8bpp_avx: 72.7
vp9_diag_downleft_32x32_10bpp_c: 1101.2
vp9_diag_downleft_32x32_10bpp_sse2: 145.4
vp9_diag_downleft_32x32_10bpp_ssse3: 137.5
vp9_diag_downleft_32x32_10bpp_avx: 134.8
vp9_diag_downleft_32x32_10bpp_avx2: 94.0
vp9_diag_downleft_32x32_12bpp_c: 1108.5
vp9_diag_downleft_32x32_12bpp_sse2: 145.5
vp9_diag_downleft_32x32_12bpp_ssse3: 137.3
vp9_diag_downleft_32x32_12bpp_avx: 135.2
vp9_diag_downleft_32x32_12bpp_avx2: 94.0
~30% faster than avx implementation
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
8 years ago
Ilia
2f3d10a01a
avcodec/vp9: avx2 implementation of ipred_dl_16x16_16
...
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5
Benchmarked with 10000 runs
Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
8 years ago
Ronald S. Bultje
ce78729033
vp9: don't keep a stack pointer if we don't need it.
...
This saves one register in a few cases on 32bit builds with unaligned
stack (e.g. MSVC), making the code slightly easier to maintain.
(Can someone please test this on 32bit+msvc and confirm make fate-vp9
and tests/checkasm/checkasm still work after this patch?)
9 years ago
Ronald S. Bultje
cb912b4521
vp9: fix msvc build by using 6 GPRs on 32bit if stack!=aligned.
9 years ago
Ronald S. Bultje
061b67fb50
vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction.
9 years ago
Ronald S. Bultje
26ece7a511
vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.
9 years ago