Ronald S. Bultje
e3f530feca
prores: idct sse2/sse4 optimizations.
...
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
13 years ago
Alex Converse
48f7163f13
dsputil_mmx: Honor HAVE_AMD3DNOW
14 years ago
Baptiste Coudurier
9a33078b64
dsputil_mmx: fix indention
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Kostya Shishkov
d241f51e0f
Move RV3/4-specific DSP functions into their own context
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Jason Garrett-Glaser
a3bf7b864a
H.264: tweak some other x86 asm for Atom
14 years ago
Mans Rullgard
a617c6aaa3
dsputil: update per-arch init funcs for non-h264 high bit depth
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
e7a972e113
simple_idct: add 10-bit version
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Diego Biurrun
65083b4911
dsputil: remove disabled code
14 years ago
Mans Rullgard
710b8df949
dsputil: remove ff_emulated_edge_mc macro used in one place
...
This macro can cause problems in conjunction with the bitdepth
template expansion. It was presumably added to keep source
compatibility when high bitdepth support was added. However,
emulated_edge_mc is a dsputil pointer and should not be called
directly, so there is little reason to keep such a macro.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Daniel Kang
c0483d0c7a
H.264: Add x86 assembly for 10-bit H.264 predict functions
...
Mainly ported from 8-bit H.264 predict.
Some code ported from x264. LGPL ok by author.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Daniel Kang
3c7c16fde3
YASM: Shut up unused variable compiler warning with --disable-yasm.
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
14 years ago
Daniel Kang
58f7aad051
Fix build with --disable-yasm.
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Michael Niedermayer
889639969b
dsputil_mmx: try to fix compilation without yasm.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Daniel Kang
9bfa5363da
H.264: Add x86 assembly for 10-bit H.264 qpel functions.
...
Mainly ported from 8-bit H.264 qpel.
Some code ported from x264. LGPL ok by author.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Justin Ruggles
6054cd25b4
ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.
14 years ago
Diego Biurrun
d2ee495fb2
configure: Drop check for availability of ten assembler operands.
...
This was done to support gcc 2.95, which is an old legacy compiler
that fails to compile the current codebase anyway.
14 years ago
Ronald S. Bultje
ed63f527f2
Fix build if yasm is not available.
14 years ago
Daniel Kang
f188a1e0ca
H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions.
...
Mainly ported from 8-bit H.264 MC Chroma.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Jason Garrett-Glaser
c90b94424c
4:4:4 H.264 decoding support
...
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
14 years ago
Jason Garrett-Glaser
504811baea
Roll back 4:4:4 H.264 for now
...
Needs some ARM/PPC asm modifications.
14 years ago
Jason Garrett-Glaser
c9c493872c
4:4:4 H.264 decoding support
...
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
14 years ago
Jason Garrett-Glaser
9f3d6ca4f1
Port x86 10-bit H.264 deblock asm from x264
14 years ago
Oskar Arvidsson
19a0729b4c
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Baptiste Coudurier
6d4c49a2af
Move png mmx functions into x86/png_mmx.c, remove them from DSPContext.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Oskar Arvidsson
8dbe585641
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Alexander Strange
1500be13f2
dsputil: allow to skip drawing of top/bottom edges.
14 years ago
Justin Ruggles
e6e9823488
Add apply_window_int16() to DSPContext with x86-optimized versions and use it
...
in the ac3_fixed encoder.
14 years ago
Michael Niedermayer
d375c10400
Fake-Merge remote-tracking branch 'ffmpeg-mt/master'
14 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Ronald S. Bultje
6a717eb4aa
dsputil_mmx.c: remove ff_vector128.
...
Remove ff_vector128, it is identical to ff_pb_80.
(cherry picked from commit bf6fa73245
)
14 years ago
Ronald S. Bultje
bf6fa73245
dsputil_mmx.c: remove ff_vector128.
...
Remove ff_vector128, it is identical to ff_pb_80.
14 years ago
Ronald S. Bultje
9a1ced321b
dsputil: move VC1-specific stuff into VC1DSPContext.
...
(cherry picked from commit 12802ec060
)
14 years ago
Ronald S. Bultje
12802ec060
dsputil: move VC1-specific stuff into VC1DSPContext.
14 years ago
Justin Ruggles
fe2ff6d247
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit c73d99e672
)
14 years ago
Justin Ruggles
c73d99e672
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Ronald S. Bultje
baffa091af
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
(cherry picked from commit 81f2a3f4ff
)
14 years ago
Justin Ruggles
389b5bfa34
cosmetics: indentation
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit d19b744a36
)
14 years ago
Justin Ruggles
a8ae4e0e7b
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 80ba1ddb58
)
14 years ago
Ronald S. Bultje
81f2a3f4ff
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
14 years ago
Justin Ruggles
d19b744a36
cosmetics: indentation
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
80ba1ddb58
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
015f9f1ad3
Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 6eabb0d3ad
)
14 years ago
Justin Ruggles
6eabb0d3ad
Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
ef4a65149d
Replace ASMALIGN() with .p2align
...
This macro has unconditionally used .p2align for a long time and
serves no useful purpose.
14 years ago
Mans Rullgard
ac3c9d0169
x86: remove VLA in ac3_downmix_sse
14 years ago
Ronald S. Bultje
ec3233a855
Fix ff_pw_3 alignment.
...
Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Jason Garrett-Glaser
19fb234e4a
H.264: split luma dc idct out and implement MMX/SSE2 versions
...
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Ronald S. Bultje
8d147f1f60
For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes
...
and then using movlhps to dup it into the higher half of the register.
Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Baptiste Coudurier
90f1f3bf00
In yadif filter, declare asm constants directly to avoid dependency on libavcodec
...
Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago
Baptiste Coudurier
9e95999e2a
10l, add ff_pw_1 to dsputil_mmx for yadif sse2
...
Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk
14 years ago