Aurelien Jacobs
f0b23422fa
use av_noinline instead of __attribute((noinline))
...
Originally committed as revision 8091 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
7073e9fc69
rename CMOV_IS_FAST to HAVE_FAST_CMOV and simplify configure
...
Originally committed as revision 7729 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
3b6dc9ca6d
replace a few hardcoded numbers with their correct named ones
...
Originally committed as revision 7441 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Måns Rullgård
849f10351d
rename always_inline to av_always_inline and move to common.h
...
Originally committed as revision 7256 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
c61b9d4473
PIC fix
...
Originally committed as revision 7173 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Reimar Döffinger
d55f46e5a8
Reenable AMD64 optimizations for cabac accidentially disabled in r6852
...
Originally committed as revision 6853 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
419b878494
Add ARCH_X86_32 as a new define for 32 bit x86 architectures and change
...
the semantics of ARCH_X86 to mean both 32 and 64 bits.
Originally committed as revision 6852 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
d5cd50ed73
Fix compilation with PIC enabled, BRANCHLESS_GET_CABAC is defined under
...
!PIC but gets used without a check for !PIC.
Originally committed as revision 6834 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Reimar Döffinger
755073fe3c
CABAC assembler optimizations ported to AMD64
...
Originally committed as revision 6776 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
e08f580644
decode_significance_8x8_x86()
...
8% faster decode_cabac_residual() (8x8 case only) on P3
Originally committed as revision 6750 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Guillaume Poirier
94e4c3a333
Protect code that uses CMOV instructions with HAVE_CMOV,
...
Make configure set CMOV_IS_FAST on arches on which cmov has a low latency
(typically non-Netburst based processor)
Originally committed as revision 6749 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
849a50041c
another instruction less in decode_significance_x86() -> 1% faster ion P3
...
Originally committed as revision 6745 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
d3e7c5c35b
1 instruction less
...
Originally committed as revision 6743 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
a616db285a
reordering instructions a little in decode_significance_x86() -> 2 instructions less / 1% faster decode_residual on P3
...
Originally committed as revision 6741 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
13404b2e98
factorize get_cabac asm (0.5% slower but its much cleaner)
...
Originally committed as revision 6740 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Bernhard Rosenkränzer
ba9fb5da3a
Fix PIC compilation, some defines were under #ifdef !PIC but used
...
in the PIC case nevertheless.
patch by Bernhard Rosenkranzer, bero arklinux org
Originally committed as revision 6738 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
d72bc32389
unused variable
...
Originally committed as revision 6737 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
ebd624b662
optimize sign decoding code in decode_residual()
...
x86 is 4% faster on P3
C sign stuff + x86 code for everything else is also faster then before (sorry forgot to test pure C)
... and if i replace the second occurance of the sign decoding in decode_residual by the asm too then everything gets slower iam starting to think that it might be best to write the whole function in asm, playing this avoid random deoptimizations game with gcc is not fun at all
Originally committed as revision 6732 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Jindřich Makovička
a0f2c6ba38
Kill a warning with MSVC
...
Patch by Jindrich Makovicka makovick A gmail P com
Original thread:
Date: 08:21 AM
Subject Re: [Ffmpeg-devel] Weird line in cabac.h
Originally committed as revision 6726 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
eb73bf723d
x86 asm version of the decode significance loop (not 8x8) of decode_residual() 5% faster decode_residual() on P3
...
Originally committed as revision 6724 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
4041a495a8
cosmetic (%%eax->%0)
...
Originally committed as revision 6717 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
8dda3e796b
Fix crash with illegal instruction, cmov is available on 686 and later only.
...
Originally committed as revision 6715 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
e962604f1c
Expand some #endif comments.
...
Originally committed as revision 6714 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
165c5f0909
fix !CMOV_IS_FAST case (iam not really happy with the fix but i didnt come up with a better one quickly)
...
Originally committed as revision 6707 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
1d7c111856
10l
...
Originally committed as revision 6704 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
faff3a7ad0
this code will not work with PIC as it needs 7 registers and gcc doesnt support that in PIC
...
Originally committed as revision 6703 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
f24a515931
shift CABACContext.range right, this reduces the number of shifts needed in get_cabac() and is slightly faster on P3 (and should be much faster on P4 as the P4 except the more recent variants lacks an integer shifter and so shifts have ~10 times longer latency then simple operations like adds)
...
Originally committed as revision 6702 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
68a205edef
dehack *ps_state indexing in the branchless decoder
...
Originally committed as revision 6683 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
12ff5b0f3b
add "memory" to the clobber list we change memory so we need it, this also fixes some problems with gcc svn
...
Originally committed as revision 6679 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
851ded8918
prevent "mb level" get_cabac() calls from being inlined (3% faster decode_mb_cabac() on P3)
...
Originally committed as revision 6674 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Guillaume Poirier
a0490b324a
adds some useful comments after some of the #else, #elseif,
...
#endif preprocessor directives to make it clearer which code
block depends on which #define xx
Originally committed as revision 6668 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
c26abfa541
Rename ABS macro to FFABS.
...
Originally committed as revision 6666 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
1f4d5e9f69
slightly faster on P3 slightly slower on athlon and probably faster on P4
...
Originally committed as revision 6663 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
2b5269b51c
moving lps state transition code a little up in the branched asm code (1% faster on P3)
...
Originally committed as revision 6658 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
b99f3cabed
write cabac low and range variables as early as possible to prevent stalls from reading them before they where written, the P4 is said to disslike that alot, on P3 its 2% faster (START/STOP_TIMER over decode_residual)
...
Originally committed as revision 6657 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
d17faef011
use ecx instead of cl (no speed change on P3 but might avoid partial register stalls on some cpus)
...
Originally committed as revision 6656 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
d61c4e731e
make state transition tables global as they are constant and the code is slightly faster that way
...
Originally committed as revision 6655 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
5f3eca121e
10l
...
Originally committed as revision 6654 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
0fa352c7e6
make lps_range a global table its constant anyway (saves 1 addition for accessing it)
...
Originally committed as revision 6653 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
3650b43959
enable CMOV_IS_FAST as its faster or equal speed on every cpu (duron, athlon, PM, P3) from which ive seen benchmarks, it might be slower on P4 but noone has posted benchmarks ...
...
Originally committed as revision 6652 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Diego Biurrun
0bc2e7f081
BRANCHLESS_CABAD --> BRANCHLESS_CABAC_DECODER
...
Originally committed as revision 6623 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
9ed92c65f1
moving another bit&1 out, this is as fast as with it in there, but it makes more sense with it outside of the loop
...
Originally committed as revision 6618 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
f1b37db48d
move the &1 out of the asm so gcc can optimize it away in inlined cases (yes this is slightly faster)
...
Originally committed as revision 6616 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
ab0151d163
replace a few and/sub/... by cmov
...
this is faster on P3, should be faster on AMD, and should be slower on P4
its disabled by default (benchmarks welcome so we know when to enable it)
Originally committed as revision 6615 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
a6672acf45
reading 8bit mem into a 8bit register needs 2 uops on P4, 8bit->32bit with zero extension needs just 1
...
Originally committed as revision 6612 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
2d3df05ca0
on the P4 inc needs twice as much time a add
...
Originally committed as revision 6611 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
2ee9dc65be
10l
...
Originally committed as revision 6610 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
7822e1c1ff
reverse remainder of the failed attempt to optimize *state=c->mps_state[s]
...
Originally committed as revision 6609 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
ef0090a998
x86 branchless cabac decoder
...
slightly faster on P3
Originally committed as revision 6608 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago
Michael Niedermayer
2e1aee80f4
optimize branchless C CABAC decoder
...
Originally committed as revision 6607 to svn://svn.ffmpeg.org/ffmpeg/trunk
18 years ago