mirror of https://github.com/opencv/opencv.git
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On x86 this showed about 1.4x improvement. For PPC, do a full multiply (32x32->64b), convert to DP then accumulate. This may be slightly less precise for some inputs. But is 1.5x faster than the above which is about 1.5x than the FMA above for ~2.5x speedup.pull/15339/head
parent
7295983964
commit
33fb253a66
2 changed files with 38 additions and 0 deletions
Loading…
Reference in new issue