* Optimize some function with lasx.
Optimize some function with lasx. #23929
This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.
* hopefully fixed compile errors
* continue to fix CI
* another attempt to fix build on Linux aarch64
* * reverted to the original method to detect special arm neon instructions without signal()
* renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively
* removed extra whitespaces
- Added cross compile cmake file for target riscv64-clang
- Extended cmake for RISC-V and added instruction checks
- Created intrin_rvv.hpp with C++ version universal intrinsics
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Removed some unused code in mips.toolchain.cmake.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* core: improve AVX512 infrastructure by adding more CPU features groups
* cmake: use groups for AVX512 optimization flags
* core: remove gap in CPU flags enumeration
* cmake: restore default CPU_DISPATCH
- add infrastructure support for Power9/VSX3
- fix missing VSX flags on GCC4.9 and CLANG4(#13210, #13222)
- fix disable VSX optimzation on GCC by using flag ENABLE_VSX
- flag ENABLE_VSX is deprecated now, use CPU_BASELINE, CPU_DISPATCH instead
- add VSX3 to arithmetic dispatchable flags
- Recommended compiler checks:
- GCC: CV_GCC
- Clang: CV_CLANG
- fixed problem with CMAKE_CXX_COMPILER_ID=Clang/AppleClang mess on MacOSX
Details: cmake --help-policy CMP0025
- do not declare Clang as GCC compiler
The opencv infrastructure mostly has the basics for supporting avx512 math functions,
but it wasn't hooked up (likely due to lack of users)
In order to compile the DNN functions for AVX512, a few things need to be hooked up
and this patch does that
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
* Update OpenCVCompilerOptimizations.cmake
Neon not supported on MSVC ARM breaking build fix
* Update OpenCVCompilerOptimizations.cmake
Whitespace
* Update intrin.hpp
Many problems in MSVC ARM builds (at least on VS2017) being fixed in this PR now.
C:\Users\Gregory\DOCUME~1\MYLIBR~1\OPENCV~3\opencv\sources\modules\core\include\opencv2/core/hal/intrin.hpp(444): error C3861: '_tzcnt_u32': identifier not found
* Update hal_replacement.hpp
Passing variadic expansion in a macro to another macro does not work properly in MSVC and a famous known workaround is hereby applied. Discussion of it: https://stackoverflow.com/questions/5134523/msvc-doesnt-expand-va-args-correctly
Only needed the fix for ARM builds: TEGRA_ macros are used for cv_hal_ functions in the carotene library.
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): warning C4003: not enough actual parameters for macro 'TEGRA_ADD'
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): error C2143: syntax error: missing ')' before ','
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): error C2059: syntax error: ')'
* Update hal_replacement.hpp
All hal_replacement's using carotene\hal\tegra_hal.hpp TEGRA_ functions as macros preprocessed by variadic macros should be changed, identical as was done in core.
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\imgproc\src\color.cpp(9604): warning C4003: not enough actual parameters for macro 'TEGRA_CVTBGRTOBGR'
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\imgproc\src\color.cpp(9604): error C2059: syntax error: '=='
* Update OpenCVCompilerOptimizations.cmake
* Update hal_replacement.hpp
* Update hal_replacement.hpp