* 1. changed static const __m128/256 to const __m128/256 to avoid wierd instructions and calls inserted by compiler.
2. added universal intrinsics that wrap MOVNTPS and other such (non-temporary or "no cache" store) instructions. v_store_interleave() and v_store() got respective flags/overloaded variants
3. rewrote split & merge to use the "no cache" store instructions. It resulted in dramatic performance improvement when processing big arrays
* hopefully, fixed some test failures where 4-channel v_store_interleave() is used
* added missing implementation of the new universal intrinsics (v_store_aligned_nocache() etc.)
* fixed silly typo in the new intrinsics in intrin_vsx.hpp
* still trying to fix VSX compiler errors
* still trying to fix VSX compiler errors
* still trying to fix VSX compiler errors
* still trying to fix VSX compiler errors
* fixed/updated v_load_deinterleave and v_store_interleave intrinsics; modified split() and merge() functions to use those intrinsics
* fixed a few compile errors and bug in v_load_deinterleave(ptr, v_uint32x4& a, v_uint32x4& b)
* fixed few more compile errors
the static variables will cause race-condition when operating in
multithread scenarios.
Signed-off-by: Teng Yiliang <ylteng@outlook.com>
Signed-off-by: Teng Yiliang <yiliang.teng@weimob.com>
* core:OE-27 prepare universal intrinsics to expand (#11022)
* core:OE-27 prepare universal intrinsics to expand (#11022)
* core: Add universal intrinsics for AVX2
* updated implementation of wide univ. intrinsics; converted several OpenCV HAL functions: sqrt, invsqrt, magnitude, phase, exp to the wide universal intrinsics.
* converted log to universal intrinsics; cleaned up the code a bit; added v_lut_deinterleave intrinsics.
* core: Add universal intrinsics for AVX2
* fixed multiple compile errors
* fixed many more compile errors and hopefully some test failures
* fixed some more compile errors
* temporarily disabled IPP to debug exp & log; hopefully fixed Doxygen complains
* fixed some more compile errors
* fixed v_store(short*, v_float16&) signatures
* trying to fix the test failures on Linux
* fixed some issues found by alalek
* restored IPP optimization after the patch with AVX wide intrinsics has been properly tested
* restored IPP optimization after the patch with AVX wide intrinsics has been properly tested
- 'if' logic is moved into templates.
- removed unnecessary cv::Mat objects creation.
- fixed inv() test (invA * A == eye)
- added more Matx tests to cover all defined template specializations