Ronald S. Bultje
715f139c9b
vp9lpf/x86: make filter_16_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
8915320db9
vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
725a216481
vp9lpf/x86: make filter_44_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
5bfa96c4b3
vp9lpf/x86: make filter_16_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
b905e8d2fe
vp9lpf/x86: make filter_48/84_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
37637e6590
vp9lpf/x86: make filter_88_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
be10834bd9
vp9lpf/x86: make filter_44_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
7c62891efe
vp9lpf/x86: save one register in SIGN_ADD/SUB.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
c6375a83d1
vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
...
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
4ce8ba72f9
vp9lpf/x86: move variable assigned inside macro branch.
...
The value is not used outside the branch.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
e4961035b2
vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
683da2788e
vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
6e74e9636b
vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
6411c328a2
vp9lpf/x86: make cglobal statement more conservative in register allocation.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
a6e288d624
vp9lpf/x86: save one register in loopfilter surface coverage.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
0ed21bdc9e
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
f2e3d706a1
vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
James Almer
92d47550ea
vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
...
Similar gains as the ssse3 version once again
Additional improvements by Clément Bœsch <u@pkh.me>.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
6bea478158
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
James Almer
1f451eed60
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
...
Similar gains in performance as the SSSE3 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Clément Bœsch
a692724c58
vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
8 years ago
Ronald S. Bultje
a4edaa0270
vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters.
...
Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
8 years ago
Ronald S. Bultje
7ca422bb1b
vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters.
...
Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
8 years ago
Ronald S. Bultje
3aefca68ca
vp9/x86: add myself to copyright holders for loopfilter assembly.
10 years ago
Ronald S. Bultje
afd8c464b7
vp9/x86: make filter_16_h work on 32-bit.
10 years ago
Ronald S. Bultje
b26bc3520f
vp9/x86: make filter_48/84/88_h work on 32-bit.
10 years ago
Ronald S. Bultje
8a1cff1c35
vp9/x86: make filter_44_h work on 32-bit.
10 years ago
Ronald S. Bultje
047088b8c6
vp9/x86: make filter_16_v work on 32-bit.
10 years ago
Ronald S. Bultje
0cc9c23ea1
vp9/x86: make filter_48/84_v work on 32-bit.
10 years ago
Ronald S. Bultje
6433a9133f
vp9/x86: make filter_88_v work on 32-bit.
10 years ago
Ronald S. Bultje
75f8e52089
vp9/x86: make filter_44_v work on 32-bit.
10 years ago
Ronald S. Bultje
7f80c3344c
vp8/x86: save one register in SIGN_ADD/SUB.
10 years ago
Ronald S. Bultje
8ea2194ebb
vp9/x86: store unpacked intermediates for filter6/14 on stack.
...
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
10 years ago
Ronald S. Bultje
e42409479f
vp8/x86: move variable assigned inside macro branch.
...
The value is not used outside the branch.
10 years ago
Ronald S. Bultje
418c202c63
vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
10 years ago
Ronald S. Bultje
d1c55654e1
vp8/x86: remove unused register from ABSSUB_CMP macro.
10 years ago
Ronald S. Bultje
e59bd08986
vp9/x86: slightly simplify 44/48/84/88 h stores.
10 years ago
Ronald S. Bultje
8132629bd5
vp9/x86: make cglobal statement more conservative in register allocation.
10 years ago
Ronald S. Bultje
c013ca58c5
vp9/x86: save one register in loopfilter surface coverage.
10 years ago
Michael Niedermayer
41d82b85ab
avcodec/x86/vp9lpf: Always include x86util.asm
...
Fixes executable stack
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
Christophe Gisquet
4e128ab0b1
x86: vpx/h264/hevc/mpeg2: share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
James Almer
de417982e8
x86/vp9lpf: use fewer instructions in SPLATB_MIX
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
10 years ago
Clément Bœsch
62d31307c1
avcodec/x86/vp9lpf: add a comment above a bunch of SWAP.
11 years ago
Clément Bœsch
f0d368d758
avcodec/x86/vp9lpf: merge a few movs with other instructions.
11 years ago
Clément Bœsch
010732b73a
vp9/x86: simplify FILTER_INIT.
...
In the 2 FILTER_INIT usages, the source is already preloaded so that
extra complexity taken from FILTER_UPDATE is not necessary.
Also add forgotten "mask" argument in FILTER_{INIT,UPDATE} comments.
11 years ago
Clément Bœsch
b8d002dc95
vp9/x86: clarify mixed splatb.
11 years ago
Clément Bœsch
669d4f9053
x86/vp9lpf: simplify 2nd transpose in 44/48/88/84.
...
For non-avx optims, this saves 8 movs.
before:
1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips
3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips
2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips
3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips
after:
1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips
3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips
2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips
3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips
11 years ago
Clément Bœsch
d92a725329
x86/vp9lpf: remove 8 SWAPs in 84/48 transpose.
11 years ago
Clément Bœsch
97dde561de
x86/vp9lpf: remove braindead double pxor.
11 years ago
Clément Bœsch
9a3b05b0a9
x86/vp9lpf: save a few mov in flat8in/hev masks calc.
11 years ago