mirror of https://github.com/FFmpeg/FFmpeg.git
Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>pull/3/merge
parent
34454c761f
commit
2784d18791
2 changed files with 43 additions and 0 deletions
Loading…
Reference in new issue