The input complex factors are constant for each iterations. This substitudes 4 loads, 2 additions and 2 subtractions per iteration of the inner-loop with another 4 loads. Thus effectively 4 arithmetic operations per iteration of the inner loop are avoided, i.e. 24 operations per iteration of the outer loop, or 24 * (n - 1) operations in total. If the inner loop is not unrolled by the compiler, this also might also save some pointer arithmetic as most instruction sets do not have addressing modes with negated register offsets (12 - j). Unless the compiler is optimising for code size, this is unlikely though.pull/388/head
parent
db73ae0dc1
commit
08edacc248
1 changed files with 14 additions and 11 deletions
Loading…
Reference in new issue