Ronald S. Bultje
488fadebbc
vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version.
10 years ago
Ronald S. Bultje
3d0ca2fe89
vp9: 10/12bpp sse2 SIMD for iadst16.
10 years ago
Ronald S. Bultje
0e80265b0a
vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16.
10 years ago
Ronald S. Bultje
1338fb79d4
vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16.
10 years ago
Ronald S. Bultje
cb054d061a
vp9: add 10/12bpp sse2 SIMD versions of iadst8x8.
10 years ago
Ronald S. Bultje
e0610787b2
vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8.
10 years ago
Ronald S. Bultje
a35f6bdb38
vp9: add 12bpp sse2 versions of iadst4.
10 years ago
Ronald S. Bultje
235e76aeb8
vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.
...
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
10 years ago
Ronald S. Bultje
f76423d097
vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions.
10 years ago
Ronald S. Bultje
6b579cf547
vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4.
10 years ago
Ronald S. Bultje
1c3be32533
vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.
10 years ago
Christophe Gisquet
b6594a9605
x86: dct-test: add more idcts
...
In particular for 10 and 12 bits.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Christophe Gisquet
7ece8b50b1
x86: simple_idct: 12bits versions
...
On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C: 78902 decicycles in idct, 262071 runs, 73 skips
avx: 32478 decicycles in idct, 262045 runs, 99 skips
Difference between the 2:
stddev: 0.39 PSNR:104.47 MAXDIFF: 2
This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.
In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.
Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Christophe Gisquet
4369b9dc7b
x86: simple_idct(_put): 10bits versions
...
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.
For 16 frames of a DNxHD 4:2:2 10bits test sequence:
C: 60861 decicycles in idct, 1048205 runs, 371 skips
sse2: 27567 decicycles in idct, 1048216 runs, 360 skips
avx: 26272 decicycles in idct, 1048171 runs, 405 skips
The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Christophe Gisquet
e652f69b35
x86: simple_idct10_template: fix overflow in pass
...
When the input of a pass has 15 or 16 bits of precision (in particular
the column pass), the addition of a bias to W4 may lead to overflows
in the input to pmaddwd.
This requires postponing the adding of the bias to after the first
butterfly. To do so, the fact that m15, unused although zeroed, is
exploited. In case the pass is safe, an address can be directly used,
and the number of xmm regs can be decreased. Otherwise, the 32bits bias
is loaded into it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Christophe Gisquet
e9a68b0316
x86: prores: templatize 10 bits simple_idct
...
This should be reused for a generic simple_idct10 function.
Requires a bit of trickery to declare common constants in C.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
James Almer
dab5f65b25
x86/takdsp: use arithmetic shift instructions
...
p1 and p2 are int32_t.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Paul B Mahol
35af7add6f
avcodec/takdec: add x86 SIMD for rest of decorrelation modes
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
10 years ago
Ronald S. Bultje
ce78729033
vp9: don't keep a stack pointer if we don't need it.
...
This saves one register in a few cases on 32bit builds with unaligned
stack (e.g. MSVC), making the code slightly easier to maintain.
(Can someone please test this on 32bit+msvc and confirm make fate-vp9
and tests/checkasm/checkasm still work after this patch?)
10 years ago
James Almer
72254b19b8
x86/alacdsp: add simd optimized functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ronald S. Bultje
cb912b4521
vp9: fix msvc build by using 6 GPRs on 32bit if stack!=aligned.
10 years ago
Christophe Gisquet
f827a17005
blockdsp: reindent after parameter removal
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ronald S. Bultje
061b67fb50
vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction.
10 years ago
Ronald S. Bultje
26ece7a511
vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.
10 years ago
Ronald S. Bultje
db7786e8ff
vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.
10 years ago
Ganesh Ajjanagadde
0493e42eb2
avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx
...
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
Header guards are too brittle and ugly for this case.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Christophe Gisquet
562ba4a827
blockdsp: remove high bitdepth parameter
...
It is only (mis-)used to set the dsp fucntions clear_block(s). But
these functions always work on 16bits-wide elements, which make
the parameter useless and actually harmful, as it causes all content
on more than 8-bits to not use accelerated functions.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
James Almer
3178931a14
x86/hevc_sao: move 10/12bit functions into a separate file
...
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ganesh Ajjanagadde
308e7484a3
avcodec/x86/rnd_template: silence -Wunused-function on --disable-mmx
...
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
Header guards are too brittle and ugly for this case.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Michael Niedermayer
1b82b934a1
avcodec/x86/sbrdsp: Fix using uninitialized upper 32bit of noise
...
Fixes crash
Fixes: flicker-1.scout3d21443372922.28.m4a
Found-by: Dale Curtis <dalecurtis@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ganesh Ajjanagadde
07cd8d5676
avcodec/x86/cavsdsp: silence -Wunused-variable on --disable-mmx
...
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
The alternative of header guards will make it far too ugly.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ganesh Ajjanagadde
0544c95fd6
avcodec/x86/mpegaudiodsp: silence -Wunused-variable on --disable-mmx
...
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
The alternative of header guards will make it far too ugly.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ganesh Ajjanagadde
4f90818ea1
avcodec/x86/rv40dsp_init: silence -Wunused-variable on --disable-mmx
...
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
The alternative of header guards will make it far too ugly.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
James Almer
7086154aaa
x86/vp9dsp: fix local header include
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
91fcb10f08
x86/vp9dsp: add missing header include
...
Fixes make checkheaders
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
4bb6cb4c7d
x86/vp9mc: fix string concatenation of fullpel function names
...
Fixes compilation with NASM
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ganesh Ajjanagadde
92fabca427
avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx
...
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
Header guards are too brittle and ugly for this case.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ganesh Ajjanagadde
e681baf638
avcodec/x86/mpegvideoenc: silence -Wunused-function on --disable-mmx
...
This silences -Wunused-function when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ganesh Ajjanagadde
f0c635f577
avcodec/x86/hpeldsp_init: silence -Wunused-function on --disable-mmx
...
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx .
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
James Almer
6f9ba0cb82
x86/vp9dsp: add missing preprocessor guards
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
e47564828b
x86/vp9mc: add missing preprocessor guards
...
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
James Almer
2f9ab15960
x86/vp9: add avx2 subpel MC SIMD for 10/12bpp
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Michael Niedermayer
58fe57d5a0
avcodec/mpeg12enc: Basic support for encoding non even QPs for -non_linear_quant 1
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Michael Niedermayer
2d35757814
avcodec/mpegvideo: Change mpeg2 unquant to work with higher precission qscale
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
10 years ago
Ronald S. Bultje
344d519040
vp9: add subpel MC SIMD for 10/12bpp.
10 years ago
Ronald S. Bultje
77f359670f
vp9: add fullpel (avg) MC SIMD for 10/12bpp.
10 years ago
Ronald S. Bultje
6354ff0383
vp9: add fullpel (put) MC SIMD for 10/12bpp.
10 years ago
Vittorio Giovara
5d14cf1999
mpegvideo: Make sure mpegutils.h is included where needed
10 years ago
James Almer
d5f8a642f6
x86: port PSIGNW to cpuflags
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
10 years ago
Ronald S. Bultje
4b66274a86
vp9: save one (PSIGNW) instruction in iadst16_1d sse2/ssse3.
10 years ago