Diego Biurrun
efc7290eb6
x86: hpeldsp: Keep all rnd_template instantiations in hpeldsp_init
...
There is no point in having a separate file just for the instantiation
that provides the public functions.
11 years ago
Diego Biurrun
aba70bb538
Add missing headers to make template files compile (more) standalone
11 years ago
Diego Biurrun
d0aabeab23
x86: h264_qpel: Fix typo in CALL_2X_PIXELS macro invocation
...
This fixes FATE with mmxext CPUFLAGS set.
11 years ago
Peter Ross
a490970af2
libavcodec/*/vp8dsp_init: indent
...
Signed-off-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Peter Ross
89f2f5dbd7
On2 VP7 decoder
...
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: BBB
previous patch reviewed by jason
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
c25d2cd20b
avcodec/x86/mpegvideoenc_template: fix integer overflow
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
c8246d3766
avcodec/x86/h264_qpel: Fix typo introduced by 322a1dda97
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
82dd1026cf
x86: dsputil: Move hpeldsp-related declarations to a separate header
11 years ago
Diego Biurrun
6655c933a8
x86: dsputil: Move fpel declarations to a separate header
11 years ago
Diego Biurrun
322a1dda97
dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros
11 years ago
Diego Biurrun
600b854ad8
imgconvert: Move ff_deinterlace_line_*_mmx declarations out of dsputil
11 years ago
Diego Biurrun
1a8d0cf77e
x86: dsputil: Move inline assembly macros to a separate header
11 years ago
Matt Oliver
cd5cf395f6
Additional icl inline asm fix.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
1cd107f637
avcodec/x86/snowdsp: add missing clobbers to inner_add_yblock_bw_8_obmc_16_bh_even_sse2() and inner_add_yblock_bw_16_obmc_32_sse2()
...
Note, these functions are currently disabled
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
82bb304801
dsputil: Use correct type in me_cmp_func function pointer
11 years ago
Diego Biurrun
0e083d7e43
build: Group general components separate from de/encoders in arch Makefiles
...
This is in line with how the top-level libavcodec Makefile is structured.
11 years ago
Diego Biurrun
5169e68895
dsputil: Propagate bit depth information to all (sub)init functions
...
This avoids recalculating the value over and over again.
11 years ago
Carl Eugen Hoyos
57fdc74c34
Add one forgotten named inline asm operand in libavcodec/x86/motion_est.c.
11 years ago
Matt Oliver
8236747511
Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.
...
This is part of the patch-set for intel C inline asm on windows support
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Matt Oliver
b2d3a45598
avcodec/x86/mlpdsp: Only use asm when non-local inline asm lables are supported
...
This is part of the patch-set for intel C inline asm on windows support
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
aa1f38015c
x86/synth_filter: improve FMA version
...
Replace mulps+subps with fnmaddps, resulting in two less instructions inside the
inner loops.
About 1% faster FMA3 performance.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Matt Oliver
b73aae6fe9
avcodec/x86/idct_sse2_xvid: move offsets out of MANGLE()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Matt Oliver
9eb3f11c55
Add missing external declarations.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Matt Oliver
590805b7c3
Fixed 64bit conformance with mvzbl.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
db3f61a04f
x86: dsputil_init: Drop some unnecessary parentheses
11 years ago
Diego Biurrun
441b093915
x86: dsputil_init: K&R formatting cosmetics
11 years ago
Diego Biurrun
4cb4680c10
x86: dsputil_x86.h: K&R formatting cosmetics
11 years ago
Diego Biurrun
f8bbebecfd
x86: motion_est: K&R formatting cosmetics
11 years ago
Diego Biurrun
a36947c167
dsputilenc_mmx: K&R formatting cosmetics
11 years ago
Diego Biurrun
38675229a8
dsputil_mmx: K&R formatting cosmetics
11 years ago
Diego Biurrun
6a8b35dc88
dsputilenc_mmx: Merge two assignment blocks with identical conditions
11 years ago
Diego Biurrun
55519926ef
x86: Make function prototype comments in assembly code consistent
...
This helps grepping for functions, among other things.
11 years ago
Diego Biurrun
edd1f833fa
x86: h264_idct_10_bit: Use proper type in function prototype comments
11 years ago
Diego Biurrun
831a118078
Update dsputil- and SIMD-related comments to match reality more closely
11 years ago
Diego Biurrun
17608f6ee3
x86: Add some more missing headers
11 years ago
Diego Biurrun
08dba0e1c3
x86: mpegvideoenc: Remove some remnants of the long-gone libmpeg2 IDCT
11 years ago
James Almer
9e0e1f9067
x86/dsputil: add emms to ff_scalarproduct_int16_mmxext()
...
Also undo the changes to ra144enc.c from previous commits.
Should fix ticket #3429
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
3bfdee00cd
x86: dcadsp: Fix linking with yasm and optimizations disabled
...
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
11 years ago
Diego Biurrun
3741aa37c2
x86: cabac: Use correct #includes to make header compile standalone
11 years ago
James Almer
7fd64e3e36
x86/synth_filter: add synth_filter_fma3
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
206167a295
x86/synth_filter: add missing HAVE_YASM guard
...
Should fix compilation failures with --disable-yasm on some compilers
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
884e085d1e
x86/synth_filter: Revert the switch to float ops with SSE2
...
This reverts the changes 6467209836
and 68c3ed936a
did to the SSE2 version,
which generated a hit of about 5 cycles.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
68c3ed936a
x86/synth_filter: add synth_filter_avx
...
Sandy Bridge Win64:
180 cycles on ff_synth_filter_inner_sse2
150 cycles on ff_synth_filter_inner_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
6467209836
x86/synth_filter: add synth_filter_sse
...
Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
2cdbcc0048
x86: synth filter float: implement SSE2 version
...
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
169243112c
x86: dcadsp: implement SSE lfe_dir
...
Results for Arrandale/Windows:
32: 1670 -> 316
64: 728 -> 298
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
4cb6964244
dcadec: simplify decoding of VQ high frequencies
...
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
11 years ago
Christophe Gisquet
08e3ea60ff
x86: synth filter float: implement SSE2 version
...
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
11 years ago
Christophe Gisquet
ad507d7907
x86: dcadsp: implement SSE lfe_dir
...
Results for Arrandale/Windows:
32: 1670 -> 316
64: 728 -> 298
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
11 years ago
Diego Biurrun
b23650491f
prores: Use consistent names for DSP arch initialization functions
11 years ago