James Almer
dad31083ae
x86/svq1enc: port ssd_int8_vs_int16 to yasm
...
Also add an SSE2 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
b0de1c7663
x86: build: Only compile FDCT code if MMX is enabled
...
All other files containing purely inline assembly are treated the same way.
11 years ago
Diego Biurrun
12f129e545
x86: Unconditionally compile blockdsp and svq1enc init files
...
This avoids a link failure with MMX disabled as the init functions
are referenced unconditionally.
11 years ago
Diego Biurrun
009331303a
x86: huffyuvdsp: Move inline assembly to init file
...
This avoids a link failure with MMX disabled as now code and
initialization are compiled under the same condition.
11 years ago
James Almer
a441a2437b
x86: rename dsputil.asm to idctdsp.asm
...
Its only function is no longer part of dsputil.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
e3fcb14347
dsputil: Split off IDCT bits into their own context
11 years ago
James Almer
476bd3c7e4
x86/dsputil: move put_signed_pixels_clamped out of bswapdsp.asm
...
It's still a dsputil function
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
fab9df63a3
dsputil: Split off global motion compensation bits into a separate context
11 years ago
Diego Biurrun
c67b449beb
dsputil: Split bswap*_buf() off into a separate context
11 years ago
Diego Biurrun
9a9e2f1c8a
dsputil: Split audio operations off into a separate context
11 years ago
James Almer
fe782233aa
x86/blockdsp: move asm code out of dsputil
...
Also replace INLINE_<opt> with EXTERNAL_<opt> that were wrongly
changed by commit 2b05db4f81
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
e74433a8e6
dsputil: Split clear_block*/fill_block* off into a separate context
11 years ago
plepere
92cccb7bcd
avcodec/hevc: new idct + asm
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
ccff45a0d3
apedsp: move to llauddsp
...
APE is not the sole codec using scalarproduct_and_madd_int16.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
James Almer
f128342df2
build: fix compilation of svq1enc_mmx.c with --disable-mmx
...
It's needed for ff_svq1enc_init_x86() even if simd functions are disabled.
Alternatively, svq1enc_init.c could be made and the relevant code moved there.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
368f50359e
dsputil: Split off quarterpel bits into their own context
11 years ago
Diego Biurrun
054013a0fc
dsputil: Move APE-specific bits into apedsp
11 years ago
Diego Biurrun
65d5d58658
dsputil: Move SVQ1 encoding specific bits into svq1enc
11 years ago
Diego Biurrun
512f3ffe9b
dsputil: Split off HuffYUV encoding bits into their own context
...
Also shorten HuffYUV context member names to avoid clutter.
11 years ago
Diego Biurrun
0d439fbede
dsputil: Split off HuffYUV decoding bits into their own context
...
Also shorten HuffYUV context member names to avoid clutter.
11 years ago
Christophe Gisquet
f8de35ebc4
x86: hpeldsp: kill hpeldsp_mmx.c
...
before:
1987 decicycles in 8_x2, 262121 runs, 23 skips
after:
1902 decicycles in 8_x2, 262112 runs, 32 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
d1a32c3f49
x86: kill fpel_mmx.c
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
plepere
de7b89fd43
avcodec/x86/hevc: added DBF assembly functions
...
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
7be230b5fa
avcodec/x86/Makefile: remove duplicate line
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
plepere
7a2491c436
HEVC : added assembly MC functions
...
pretty print x86
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Timothy Gu
71c32ed533
DNxHD: convert inline asm to yasm
11 years ago
Peter Ross
ac4b32df71
On2 VP7 decoder
...
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
11 years ago
Timothy Gu
9d34dce05b
x86: convert DNxHDenc inline asm to yasm
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
efc7290eb6
x86: hpeldsp: Keep all rnd_template instantiations in hpeldsp_init
...
There is no point in having a separate file just for the instantiation
that provides the public functions.
11 years ago
Peter Ross
89f2f5dbd7
On2 VP7 decoder
...
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: BBB
previous patch reviewed by jason
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Diego Biurrun
0e083d7e43
build: Group general components separate from de/encoders in arch Makefiles
...
This is in line with how the top-level libavcodec Makefile is structured.
11 years ago
James Almer
07b4b0ca62
tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4}
...
Results are from a Win64 build running on an AMD FX 6300
1121 decicycles in ttafilter_process_dec_c, 16777112 runs, 104 skips
522 decicycles in ff_ttafilter_process_dec_ssse3, 16777149 runs, 67 skips
477 decicycles in ff_ttafilter_process_dec_sse4, 16777156 runs, 60 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Ronald S. Bultje
fdb093c4e4
vp9/x86: intra prediction SIMD.
...
Partially based on h264_intrapred. (I hope to eventually merge these
two intrapred implementations back together.)
11 years ago
James Darnley
623f380a18
lavc: fix flac encoder and decoder dependencies
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Christophe Gisquet
5b59a9fc61
x86: dcadsp: implement int8x8_fmul_int32
...
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
11 years ago
Loren Merritt
9c978f243a
flac/x86: add ff_flac_lpc_32_sse4()
...
benchmarked on sandybridge x86_64:
1358232 decicycles in flac_lpc_32_c
1244575 decicycles in flac_lpc_32_sse4, James Almer's patch
650045 decicycles in flac_lpc_32_sse4, this patch
I haven't tested the edgecases such as odd block lengths
odd block length tested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Michael Niedermayer
f70d7eb20c
Move add/diff_int16 to lossless_videodsp
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Clément Bœsch
af68bd1c06
vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_ssse3().
...
16662 decicycles in loop_filter_h_16_16_c, 8387355 runs, 1253 skips
17510 decicycles in loop_filter_v_16_16_c, 8387516 runs, 1092 skips
4941 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 8387887 runs, 721 skips
3899 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 8387980 runs, 628 skips
Overall decode time goes from:
./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 8.10s user 0.02s system 99% cpu 8.126 total
to:
./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 6.15s user 0.04s system 99% cpu 6.199 total
(46 to 61 fps)
11 years ago
Ronald S. Bultje
8729964b99
vp9: split x86 assembly in two files.
...
(And in future, loopfilter or intra pred could be put in their own
respective files also.)
11 years ago
Ronald S. Bultje
72ca830f51
lavc: VP9 decoder
...
Originally written by Ronald S. Bultje <rsbultje@gmail.com> and
Clément Bœsch <u@pkh.me>
Further contributions by:
Anton Khirnov <anton@khirnov.net>
Diego Biurrun <diego@biurrun.de>
Luca Barbato <lu_zero@gentoo.org>
Martin Storsjö <martin@martin.st>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
11 years ago
Diego Biurrun
0338c39698
dsputil: Split off H.263 bits into their own H263DSPContext
11 years ago
Diego Biurrun
1700b4e678
x86: vp8dsp: Split loopfilter code into a separate file
11 years ago
Diego Biurrun
2ddb35b911
x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx
...
The function does not depend on MMX and compilation without MMX enabled
fails if the function is compiled conditional on MMX availability.
11 years ago
Diego Biurrun
6cc133ec58
x86: fdct: Only build fdct code if encoders have been enabled
...
fdct is only initialized if encoders are enabled.
11 years ago
Ronald S. Bultje
c07ac8d467
VP9 MC (ssse3) optimizations.
...
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
11 years ago
Diego Biurrun
a64f6a04ac
dsputil: x86: Hide arch-specific initialization details
...
Also give consistent names to init functions.
12 years ago
Diego Biurrun
8506ff97c9
vp56: Mark VP6-only optimizations as such.
...
Most of our VP56 optimizations are VP6-only and will stay that way.
So avoid compiling them for VP5-only builds.
12 years ago
Diego Biurrun
e7b31844f6
x86: Split DCT and FFT initialization into separate files
12 years ago
Diego Biurrun
186599ffe0
build: cosmetics: Place unconditional before conditional OBJS lines
...
Signed-off-by: Martin Storsjö <martin@martin.st>
12 years ago
Diego Biurrun
245b76a108
x86: dsputil: Split inline assembly from init code
...
Also remove some pointless comments.
12 years ago