73 Commits (45ee0e3282f9683fc1d5f1528f4edb889c4118a3)

Author SHA1 Message Date
Reimar Döffinger 755073fe3c CABAC assembler optimizations ported to AMD64 18 years ago
Michael Niedermayer e08f580644 decode_significance_8x8_x86() 18 years ago
Guillaume Poirier 94e4c3a333 Protect code that uses CMOV instructions with HAVE_CMOV, 18 years ago
Michael Niedermayer 849a50041c another instruction less in decode_significance_x86() -> 1% faster ion P3 18 years ago
Michael Niedermayer d3e7c5c35b 1 instruction less 18 years ago
Michael Niedermayer a616db285a reordering instructions a little in decode_significance_x86() -> 2 instructions less / 1% faster decode_residual on P3 18 years ago
Michael Niedermayer 13404b2e98 factorize get_cabac asm (0.5% slower but its much cleaner) 18 years ago
Bernhard Rosenkränzer ba9fb5da3a Fix PIC compilation, some defines were under #ifdef !PIC but used 18 years ago
Michael Niedermayer d72bc32389 unused variable 18 years ago
Michael Niedermayer ebd624b662 optimize sign decoding code in decode_residual() 18 years ago
Jindřich Makovička a0f2c6ba38 Kill a warning with MSVC 18 years ago
Michael Niedermayer eb73bf723d x86 asm version of the decode significance loop (not 8x8) of decode_residual() 5% faster decode_residual() on P3 18 years ago
Michael Niedermayer 4041a495a8 cosmetic (%%eax->%0) 18 years ago
Diego Biurrun 8dda3e796b Fix crash with illegal instruction, cmov is available on 686 and later only. 18 years ago
Diego Biurrun e962604f1c Expand some #endif comments. 18 years ago
Michael Niedermayer 165c5f0909 fix !CMOV_IS_FAST case (iam not really happy with the fix but i didnt come up with a better one quickly) 18 years ago
Michael Niedermayer 1d7c111856 10l 18 years ago
Michael Niedermayer faff3a7ad0 this code will not work with PIC as it needs 7 registers and gcc doesnt support that in PIC 18 years ago
Michael Niedermayer f24a515931 shift CABACContext.range right, this reduces the number of shifts needed in get_cabac() and is slightly faster on P3 (and should be much faster on P4 as the P4 except the more recent variants lacks an integer shifter and so shifts have ~10 times longer latency then simple operations like adds) 18 years ago
Michael Niedermayer 68a205edef dehack *ps_state indexing in the branchless decoder 18 years ago
Michael Niedermayer 12ff5b0f3b add "memory" to the clobber list we change memory so we need it, this also fixes some problems with gcc svn 18 years ago
Michael Niedermayer 851ded8918 prevent "mb level" get_cabac() calls from being inlined (3% faster decode_mb_cabac() on P3) 18 years ago
Guillaume Poirier a0490b324a adds some useful comments after some of the #else, #elseif, 18 years ago
Diego Biurrun c26abfa541 Rename ABS macro to FFABS. 18 years ago
Michael Niedermayer 1f4d5e9f69 slightly faster on P3 slightly slower on athlon and probably faster on P4 18 years ago
Michael Niedermayer 2b5269b51c moving lps state transition code a little up in the branched asm code (1% faster on P3) 18 years ago
Michael Niedermayer b99f3cabed write cabac low and range variables as early as possible to prevent stalls from reading them before they where written, the P4 is said to disslike that alot, on P3 its 2% faster (START/STOP_TIMER over decode_residual) 18 years ago
Michael Niedermayer d17faef011 use ecx instead of cl (no speed change on P3 but might avoid partial register stalls on some cpus) 18 years ago
Michael Niedermayer d61c4e731e make state transition tables global as they are constant and the code is slightly faster that way 18 years ago
Michael Niedermayer 5f3eca121e 10l 18 years ago
Michael Niedermayer 0fa352c7e6 make lps_range a global table its constant anyway (saves 1 addition for accessing it) 18 years ago
Michael Niedermayer 3650b43959 enable CMOV_IS_FAST as its faster or equal speed on every cpu (duron, athlon, PM, P3) from which ive seen benchmarks, it might be slower on P4 but noone has posted benchmarks ... 18 years ago
Diego Biurrun 0bc2e7f081 BRANCHLESS_CABAD --> BRANCHLESS_CABAC_DECODER 18 years ago
Michael Niedermayer 9ed92c65f1 moving another bit&1 out, this is as fast as with it in there, but it makes more sense with it outside of the loop 18 years ago
Michael Niedermayer f1b37db48d move the &1 out of the asm so gcc can optimize it away in inlined cases (yes this is slightly faster) 18 years ago
Michael Niedermayer ab0151d163 replace a few and/sub/... by cmov 18 years ago
Michael Niedermayer a6672acf45 reading 8bit mem into a 8bit register needs 2 uops on P4, 8bit->32bit with zero extension needs just 1 18 years ago
Michael Niedermayer 2d3df05ca0 on the P4 inc needs twice as much time a add 18 years ago
Michael Niedermayer 2ee9dc65be 10l 18 years ago
Michael Niedermayer 7822e1c1ff reverse remainder of the failed attempt to optimize *state=c->mps_state[s] 18 years ago
Michael Niedermayer ef0090a998 x86 branchless cabac decoder 18 years ago
Michael Niedermayer 2e1aee80f4 optimize branchless C CABAC decoder 18 years ago
Michael Niedermayer 1c2a417f6a move outcommented START/STOP_TIMER to a hopefully better place for benchmarking ... 18 years ago
Michael Niedermayer 30dc5f56ad drop failed attempt to optimize *state= c->mps_state[s]; 18 years ago
Michael Niedermayer c56d23dacf 10l bugfix for some disabled code 18 years ago
Michael Niedermayer f7d0b68361 first try of a handwritten get_cabac() for x86, this is 10-20% faster on P3 depening on if you try to subtract the START/STOP_TIMER overhead 18 years ago
Michael Niedermayer 5bbe2a5292 remove bytestream_end checks, seems to work fine without them and the bitstream reader doesnt check for the end either 18 years ago
Michael Niedermayer c010d69a75 decrease ff_h264_norm_shift[] size 18 years ago
Michael Niedermayer 6ff042699f cleanup 18 years ago
Michael Niedermayer 260ceb6322 branchless renormalization (1% faster get_cabac) old branchless renormalization wasnt faster because gcc was scared of the shift variable (missusing bit variable now) 19 years ago