Zuxy Meng
038bfcf9d6
3DNow! and SSSE3 optimization to QNS DSP functions; use pmulhrw/pmulhrsw instead of pmulhw
...
Originally committed as revision 9053 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Aurelien Jacobs
5b0b7054b4
better separation of vp3dsp functions from dsputil_mmx.c
...
Originally committed as revision 9039 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Ronald S. Bultje
b550bfaa61
Add libavcodec to compiler include flags in order to simplify header
...
include paths in the source files.
mostly from a patch by Ronald S. Bultje, rbultje ronald.bitfreak net
Originally committed as revision 9034 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Panagiotis Issaris
9b5dc86746
Make vp3dsp*.c compilation optional.
...
Originally committed as revision 9025 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Reimar Döffinger
e36d79c837
Change some leftover __attribute__((unused)) and __attribute__((used)) to
...
attribute_unused and attribute_used respectively to ease compiling on non-gcc.
Originally committed as revision 9024 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Zuxy Meng
25e4f8aaee
Faster SSE FFT/MDCT, patch by Zuxy Meng %zuxy P meng A gmail P com%
...
unrolls some loops, utilizing all 8 xmm registers. fft-test
shows ~10% speed up in (I)FFT and ~8% speed up in (I)MDCT on Dothan
Originally committed as revision 9017 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
ff506a906e
sse2 & ssse3 versions of dct_quantize.
...
core2: mmx2=154 sse2=73 ssse3=66 (cycles)
k8: mmx2=179 sse2=149
p4: mmx2=284 sse2=194
Originally committed as revision 9003 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
1edbfe1994
factor sum_abs_dctelem out of dct_sad, and simd it.
...
sum_abs_dctelem_* alone:
core2: c=186 mmx2=39 sse2=21 ssse3=13 (cycles)
k8: c=163 mmx2=33 sse2=31
p4: c=370 mmx2=60 sse2=60
dct_sad including sum_abs_dctelem_*:
core2: c=405 mmx2=258 sse2=240 ssse3=232
k8: c=624 mmx2=394 sse2=392
p4: c=849 mmx2=556 sse2=556
Originally committed as revision 9001 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
561f940c03
sse2 & ssse3 versions of hadamard. unroll and inline diff_pixels.
...
core2: before mmx2=193 cycles. after mmx2=174 sse2=122 ssse3=115 (cycles).
k8: before mmx2=205. after mmx2=184 sse2=180.
p4: before mmx2=342. after mmx2=314 sse2=309.
Originally committed as revision 9000 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
ba53071acb
10l, r8991 broke mmx1 sad
...
Originally committed as revision 8993 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
72946825fa
sse2 version of fullpel sad.
...
16% faster on core2, 5% faster on p4. 10% slower (and thus disabled) on k8.
Originally committed as revision 8992 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
164d75ebf3
tweak mmx2 sad.
...
40% faster on core2, 18% faster on k8, 5% faster on p4.
Originally committed as revision 8991 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
eca3810e31
tweak mmx2 sad.
...
6% faster on core2 and k8, no change on p4.
Originally committed as revision 8984 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
7c3a9fe2a3
sse2 version of fdct_col.
...
k8: 72->61 cycles, core2: 51->26 cycles.
Originally committed as revision 8966 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
5adf43e47e
cosmetics: remove code duplication in hadamard8_diff_mmx
...
Originally committed as revision 8946 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
bba5293bb7
cosmetics: remove duplicate transpose macro
...
Originally committed as revision 8939 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Reimar Döffinger
a1ce61108b
Fix parts missed in clip -> av_clip rename
...
Originally committed as revision 8760 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
fe0372296a
typos
...
Originally committed as revision 8642 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Loren Merritt
5900637219
mmx 16-bit ssd. 2.3x faster svq1 encoding.
...
Originally committed as revision 8559 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
d42f88025a
Fix wrong conditional, Snow decoding, not encoding, was SIMD-accelerated.
...
Originally committed as revision 8116 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
58e31fb1d5
reorder a few more paddws to reduce dependancy chains
...
chroma mc4 put 2480 -> 2460 dezicyles on duron
Originally committed as revision 8098 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
b4fe97696c
reorder paddws to reduce dependancy chain
...
put_h264_chroma_mc2_mmx2() 927 -> 902 dezicyles on duron
Originally committed as revision 8097 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
0c67082e02
shortening dependancy chain in chroma mc2
...
Originally committed as revision 8095 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
af26516261
remove now wrong comment
...
Originally committed as revision 8094 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
61240ae556
fix chroma mc2 bug, this is based on a patch by (Oleg Metelitsa oleg hitron co kr)
...
and does slow the mc2 chroma put down, avg interrestingly seems unaffected speedwise on duron
this of course should be rather done in a way which doesnt slow it down but its better a few %
slower but correct then incorrect
Originally committed as revision 8093 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
470d2d03cc
gcc 2.95 fix
...
Originally committed as revision 8059 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
459022f504
fix for x86-64
...
Originally committed as revision 8022 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
b21e0b6dfc
rewrite H264_CHROMA_MC4_TMPL (20% faster)
...
Originally committed as revision 8012 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
2a115873af
add a few asserts to ensure alignment
...
Originally committed as revision 7994 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
00e210ddbb
prevent h.264 MC related functions from being inlined (yes this is much faster the code just doesnt fit in the code cache otherwise)
...
Originally committed as revision 7993 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Reimar Döffinger
392b76ca93
Minor AMD64 compilation fix
...
Originally committed as revision 7907 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
9bc0d3ef3e
maybe fix x86_64 (untested)
...
Originally committed as revision 7906 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
7c4fd7eb0c
factor out common subexprssion (gcc of course is too stupid to do this ...)
...
5% faster avg_h264_chroma_mc2_mmx2()
10% faster put_h264_chroma_mc2_mmx2()
Originally committed as revision 7898 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
9301a0b4a9
merge asm fragments in H264_CHROMA_MC2_TMPL()
...
10% faster avg_h264_chroma_mc2_mmx2()
5% faster put_h264_chroma_mc2_mmx2()
Originally committed as revision 7897 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Panagiotis Issaris
9dd6c80453
Add the const specifier as needed to reduce the number of warnings.
...
Originally committed as revision 7764 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
9688979c51
Fix some more license headers.
...
Originally committed as revision 7637 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Guillaume Poirier
5a5c770d5a
Add SSSE3 (Core2 aka Conroe/Merom/Woodcrester new instructions) detection
...
Originally committed as revision 7332 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
849f10351d
rename always_inline to av_always_inline and move to common.h
...
Originally committed as revision 7256 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
486497e07b
revert bad checkin
...
Originally committed as revision 7044 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
be6ed6fff4
move some CFLAGS settings away from config.* writing section
...
Originally committed as revision 7043 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
7466ed2f02
zigzag_direct_noperm doesn't exist, remove declaration
...
Originally committed as revision 6998 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
36cd306907
rename inverse -> ff_inverse
...
Originally committed as revision 6990 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
bb54f6ab62
adding more static keywords
...
Originally committed as revision 6976 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
079e61db5d
ensure alignment (no speed change)
...
Originally committed as revision 6891 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
f5a9e8f33d
merging mov & and (no speedchange)
...
Originally committed as revision 6889 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
e80cf125a7
2 instructions less (same speed)
...
Originally committed as revision 6888 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
9347118237
comment about failed optimization
...
Originally committed as revision 6887 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
38cfdc83f0
move luma tc0 related init into asm
...
5% faster filter_mb_fast() on P3
Originally committed as revision 6884 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
25225c3773
2 instructions less in h264_loop_filter_luma_mmx2()
...
Originally committed as revision 6882 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
bda2203d56
preempt possible overflow
...
Originally committed as revision 6881 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago