Michael Niedermayer
52a81cd0e4
Fix add_paeth_prediction_mmx for rgb48
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Michael Niedermayer
afd2371d5c
merge read and and in add_paeth_prediction
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Baptiste Coudurier
6d4c49a2af
Move png mmx functions into x86/png_mmx.c, remove them from DSPContext.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Vitor Sessak
9d35fa520e
Add AVX FFT implementation.
...
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
14 years ago
Vitor Sessak
33cbfa6fa3
Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX.
...
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
14 years ago
Carl Eugen Hoyos
5c0068758f
Fix compilation with --disable-yasm.
14 years ago
Oskar Arvidsson
8dbe585641
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Alexander Strange
1500be13f2
dsputil: allow to skip drawing of top/bottom edges.
14 years ago
Justin Ruggles
e6e9823488
Add apply_window_int16() to DSPContext with x86-optimized versions and use it
...
in the ac3_fixed encoder.
14 years ago
Michael Niedermayer
d375c10400
Fake-Merge remote-tracking branch 'ffmpeg-mt/master'
14 years ago
Mans Rullgard
0aded9484d
Move dct and rdft definitions to separate files
...
This leaves fft.h with only the core FFT and MDCT definitions
thus making it more managable.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
0f999cfddb
ac3enc: add float_to_fixed24() with x86-optimized versions to AC3DSPContext
...
and use in scale_coefficients() for the floating-point AC-3 encoder.
14 years ago
Justin Ruggles
79414257e2
mathops: fix MULL() when the compiler does not inline the function.
...
If the function is not inlined, an immmediate cannot be used for the
shift parameter, so the %cl register must be used instead in that case.
This fixes compilation for x86-32 using gcc with --disable-optimizations.
14 years ago
Justin Ruggles
aaff3b312e
mathops: change "g" constraint to "rm" in x86-32 version of MUL64().
...
The 1-arg imul instruction cannot take an immediate argument, only a register
or memory argument.
14 years ago
Justin Ruggles
b181b8fb96
mathops: convert MULL/MULH/MUL64 to inline functions rather than macros.
...
This fixes unexpected name collisions that were occurring with variables
declared within the macros.
It also fixes the fate-acodec-ac3_fixed regression test on x86-32.
14 years ago
Michael Niedermayer
f7a5e7791d
Revert "ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder"
...
This reverts commit cc4d3dd3e2
.
revert at authors request due to better impementation being available
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Justin Ruggles
f1efbca5e9
ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder.
14 years ago
Mans Rullgard
a5444fee06
Add CONFIG_AC3DSP symbol to simplify makefiles
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
cc4d3dd3e2
ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Ronald S. Bultje
6a717eb4aa
dsputil_mmx.c: remove ff_vector128.
...
Remove ff_vector128, it is identical to ff_pb_80.
(cherry picked from commit bf6fa73245
)
14 years ago
Ronald S. Bultje
bf6fa73245
dsputil_mmx.c: remove ff_vector128.
...
Remove ff_vector128, it is identical to ff_pb_80.
14 years ago
Ronald S. Bultje
9a1ced321b
dsputil: move VC1-specific stuff into VC1DSPContext.
...
(cherry picked from commit 12802ec060
)
14 years ago
Ronald S. Bultje
12802ec060
dsputil: move VC1-specific stuff into VC1DSPContext.
14 years ago
Justin Ruggles
20a2a3da8f
ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16().
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit 1f004fc512
)
14 years ago
Justin Ruggles
1f004fc512
ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16().
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Justin Ruggles
7539a1fee2
ac3enc: Add x86-optimized function to speed up log2_tab().
...
AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute
value of each element in an array of int16_t.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit fbb6b49dab
)
14 years ago
Loren Merritt
11ab1e409f
FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
...
6% faster SSE FFT on Conroe, 2.5% on Penryn.
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
(cherry picked from commit e6b1ed693a
)
14 years ago
Justin Ruggles
fbb6b49dab
ac3enc: Add x86-optimized function to speed up log2_tab().
...
AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute
value of each element in an array of int16_t.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Loren Merritt
e6b1ed693a
FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
...
6% faster SSE FFT on Conroe, 2.5% on Penryn.
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
14 years ago
Justin Ruggles
a30ac54a19
Add x86-optimized versions of exponent_min().
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit dda3f0ef48
)
14 years ago
Justin Ruggles
dda3f0ef48
Add x86-optimized versions of exponent_min().
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Ronald S. Bultje
a239d534d7
Fix ff_emu_edge_core_sse() on Win64.
...
Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict
on the size of registers and which registers are being used for operations
where multiple are available. This fixes segfaults in emulated_edge()
function calls on Win64.
(cherry picked from commit 17cf7c68ed
)
14 years ago
Ronald S. Bultje
17cf7c68ed
Fix ff_emu_edge_core_sse() on Win64.
...
Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict
on the size of registers and which registers are being used for operations
where multiple are available. This fixes segfaults in emulated_edge()
function calls on Win64.
14 years ago
Justin Ruggles
fe2ff6d247
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit c73d99e672
)
14 years ago
Alex Converse
a35d782d28
Fix ff_imdct_calc_sse() on gcc-4.6
...
Gcc 4.6 only preserves the first value when using an array with an "m"
constraint.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 770c410fbb
)
14 years ago
Justin Ruggles
c73d99e672
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Ronald S. Bultje
baffa091af
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
(cherry picked from commit 81f2a3f4ff
)
14 years ago
Justin Ruggles
389b5bfa34
cosmetics: indentation
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit d19b744a36
)
14 years ago
Justin Ruggles
a8ae4e0e7b
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 80ba1ddb58
)
14 years ago
Alex Converse
770c410fbb
Fix ff_imdct_calc_sse() on gcc-4.6
...
Gcc 4.6 only preserves the first value when using an array with an "m"
constraint.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Ronald S. Bultje
81f2a3f4ff
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
14 years ago
Justin Ruggles
d19b744a36
cosmetics: indentation
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
80ba1ddb58
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
e243ed656c
x86: fix overflow in h264 8x8 planar prediction
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 80944df720
)
14 years ago
Mans Rullgard
80944df720
x86: fix overflow in h264 8x8 planar prediction
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Justin Ruggles
015f9f1ad3
Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 6eabb0d3ad
)
14 years ago
Justin Ruggles
384dbd617f
cosmetics related to LPC changes.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 1c189fc533
)
14 years ago
Justin Ruggles
7101b18508
Separate window function from autocorrelation.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 77a78e9bdc
)
14 years ago
Justin Ruggles
0d8837bdda
Move lpc_compute_autocorr() from DSPContext to a new struct LPCContext.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 56f8952b25
)
14 years ago