Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
6 years ago
Vitaly Tuzov
723165f878
fix for AVX2 version of v_reduce_min intrinsic
6 years ago
Vitaly Tuzov
f0fb91f2d4
Fixed v_signmask implementation for AVX2, updated universal intrinsics tests.
6 years ago
Alexander Alekhin
9340af1a8a
core: Async API / AsyncArray
6 years ago
Vitaly Tuzov
1220dd4877
Updated v_popcount description, reference implementation and test.
6 years ago
Vitaly Tuzov
18d10d6b86
Fixed v_reduce_sad intrinsics implementation and added tests
6 years ago
Alexander Alekhin
d6b82dcd65
Merge pull request #14162 from alalek:eliminate_coverity_scan_issues
...
core: eliminate coverity scan issues (#14162 )
* core(hal): avoid using of r,g,b,a parameters in interleave/deinterleave
- static analysis tools blame on possible parameters reordering
- align AVX parameters with corresponding SSE/NEO/VSX/cpp code
* core: avoid "i,j" parameters in Matx methods
- static analysis tools blame on possible parameters reordering
* core: resolve coverity scan issues
6 years ago
Alexander Alekhin
93a402d0f2
core: fix Core_EigenNonSymmetric.convergence test
6 years ago
Alexander Alekhin
a7c4ee9ae1
core: add iterations limit check in eigenNonSymmetric()
6 years ago
Alexander Alekhin
7366eebebb
core: fix condition in OutputArray::create(allowTransposed=True)
6 years ago
berak
20afae5a14
core: fix mat matx multiplication
6 years ago
Alexander Alekhin
dc5e69b4d4
Revert "Merge pull request #13586 from eightco:Core_bugfix3"
...
This reverts commit 3721c8bb06
except changes in modules/dnn/test/test_tf_importer.cpp
6 years ago
Lee Jaehwan
3721c8bb06
Merge pull request #13586 from eightco:Core_bugfix3
...
* Add Operator override for multi-channel Mat with literal constant.
* simple test
* Operator overloading channel constraint for primitive types
* fix some test for #13586
6 years ago
Lee Jaehwan
71aee662bd
Merge pull request #13544 from eightco:bugfix
...
Fix a bug in cv :: merge when array of 3-channel mat is input (#13544 )
* Mat merge function bug fix - Bug fix of merge function of 3-channel vector <Mat> of 3 or 4 matrices
* Add Core_merge test for opencv#13544
* fixups
6 years ago
Vitaly Tuzov
cd169941f2
Added test for addition of Mat and Matx
6 years ago
Alexander Alekhin
f605898bae
core: fix eigen2cv() - don't change fixed type of 'dst'
6 years ago
1over
b6367f5821
fixed operator- for Rect
6 years ago
Alexander Alekhin
2fa9bd221d
core: add utils::findDataFile() / samples::findFile()
6 years ago
Dmitry Kurtaev
6c76c8f881
Add a test for FileNode::keys()
6 years ago
Alexander Alekhin
96ee83898d
core(test): extend divideByZero test
...
to verify SIMD code path
6 years ago
Alexander Alekhin
5059523937
core: fix processing of vector-rows
6 years ago
Sayed Adel
93ffebc273
core: reimplement SIMD arithmetic, logic and comparison operations into wide universal intrinsics
...
- initialize arithmetic dispatcher
- add new universal intrinsic v_absdiffs
- add new universal intrinsic v_pack_b
- add accumulate version of universal intrinsic v_round
- fix sse/avx2:uint8 multiplication overflow
- reimplement arithmetic, logic and comparison operations into wide universal intrinsics
with full support for all types
- reimplement IPP arithmetic, logic and comparison operations in a sperate file arithm_ipp.hpp
- avoid scalar multiplication if scaling factor eq 1 and use integer multiplication
- move C arithmetic operations to precomp.hpp and delete [arithm_simd|arithm_core].hpp
- add compatibility with new opencv4 divide policy
6 years ago
maver1
e397434cb6
Merge pull request #12877 from maver1:3.4
...
* Updated ICV packages and IPP integration
* core(test): minMaxIdx IPP regression test
* core(ipp): workaround minMaxIdx problem
* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun
* Returned semicolon after CV_INSTRUMENT_REGION_IPP()
6 years ago
Michał Janiszewski
c8e6ce304f
Catch exceptions by const-reference
...
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
6 years ago
Alexander Alekhin
5677a683a5
core(test): zero values divide test (3.x)
6 years ago
Sayed Adel
5771fd693d
Change behaviour of 16-bit multiply operator
...
- redefine 16-bit multiply operator to perform saturating multiply
instead of non-saturating multiply
- implement 8-bit multiply operator to perform saturating multiply
- implement v_mul_wrap() for 8-bit, 16-bit non-saturating multiply
- improve performance of v_mul_hi() for VSX
- update intrin tests with new changes
- replace unv 16-bit multiplication operator with v_mul_wrap due behavior changes
- Several improvements depend on vpisarev review
* initial forward declarations for universal intrinsics
* move emulating SSE intrinsics into separate file
* implement v_mul_expand for 8-bit
* reimplement saturating multiply using v_mul_expand + v_pack
* map v_expand, v_load_expand, v_load_expand_q to sse4.1
* fix overflow avx2::v_pack(uint32)
* implement two universal intrinsics v_expand_low and v_expand_high
6 years ago
Vitaly Tuzov
1ff11c84ab
Fixed meanStdDev() implementation for the case input matrix has more than 4 channels
6 years ago
Alexander Alekhin
48e8e76a34
fix build warnings
6 years ago
Dmitry Kurtaev
24ab751547
Merge pull request #12565 from dkurt:dnn_non_intel_gpu
...
* Remove isIntel check from deep learning layers
* Remove fp16->fp32 fallbacks where it's not necessary
* Fix Kernel::run to prevent localsize > globalsize
6 years ago
Vitaly Tuzov
2f929376ec
Fixed meanStdDev() implementation for the case input matrix has more than 4 channels
6 years ago
Hamdi Sahloul
a39e0daacf
Utilize CV_UNUSED macro
6 years ago
Vadim Pisarevsky
80b62a41c6
Merge pull request #12411 from vpisarev:wide_convert
...
* rewrote Mat::convertTo() and convertScaleAbs() to wide universal intrinsics; added always-available and SIMD-optimized FP16<=>FP32 conversion
* fixed compile warnings
* fix some more compile errors
* slightly relaxed accuracy threshold for int->float conversion (since we now do it using single-precision arithmetics, not double-precision)
* fixed compile errors on iOS, Android and in the baseline C++ version (intrin_cpp.hpp)
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
6 years ago
Alexander Alekhin
8a3c394d6a
don't use constructors for C API structures
6 years ago
Alexander Alekhin
a0f86479e0
core: wrap custom types via _RawArray (raw() call)
...
- support passing of `std::vector<KeyPoint>` via InputArray
6 years ago
Alexander Alekhin
70a27c7dd6
core: add solveLP type checks for output
...
to forbid Mat1f
Checks are not reliable: empty uninitialized `cv::Mat` has `CV_8UC1` type
6 years ago
Alexander Alekhin
e86287d8ae
cleanup: IPP Async (IPP_A)
...
except header file with conversion routines (will be removed in OpenCV 4.0)
6 years ago
Alexander Alekhin
67d46dfc6c
core(intrin): restrict FP16 operations
...
Intrinsics must be effective, so don't declare FP16 type/operations if there is no native support.
- CV_FP16: supports load/store into/from float32
- CV_SIMD_FP16: declares FP16 types and native FP16 operations
6 years ago
Alexander Alekhin
7453a6938a
core(test): extra tests/fixes for merge/split ( #12171 )
...
* core(test): merge hang test
* core(merge/split): fix intrin optimization
6 years ago
Alexander Alekhin
f2e1710dd5
core(test): regression test for 12121
6 years ago
Alexander Alekhin
3f302cabb8
core(test): intrinsic tests for all dispatched CPU optimizations
...
- tests for both SIMD128 / SIMD256
- different dispatched + baseline(SIMD128) intrinsics
6 years ago
Sayed Adel
bb82cdc928
core:test Fix fp16 build if AVX2 sets as baseline
6 years ago
Sayed Adel
6499263b41
core:test Expand hal_intrin tests to support SIMD256
6 years ago
Maksim Shabunin
1165fdd0f5
Added more strict checks for empty inputs to compare, meanStdDev and RNG::fill
6 years ago
Maksim Shabunin
cbb1e867e5
More issues found by static analysis
6 years ago
Alexander Alekhin
e526c4bfe4
core(test): remove verbose messages
6 years ago
Maksim Shabunin
c473718bc2
Check for empty Mat in compare, operator= and RNG::fill, fixed related tests
6 years ago
Vadim Pisarevsky
f058b5fb1e
Wide univ intrinsics ( #11953 )
...
* core:OE-27 prepare universal intrinsics to expand (#11022 )
* core:OE-27 prepare universal intrinsics to expand (#11022 )
* core: Add universal intrinsics for AVX2
* updated implementation of wide univ. intrinsics; converted several OpenCV HAL functions: sqrt, invsqrt, magnitude, phase, exp to the wide universal intrinsics.
* converted log to universal intrinsics; cleaned up the code a bit; added v_lut_deinterleave intrinsics.
* core: Add universal intrinsics for AVX2
* fixed multiple compile errors
* fixed many more compile errors and hopefully some test failures
* fixed some more compile errors
* temporarily disabled IPP to debug exp & log; hopefully fixed Doxygen complains
* fixed some more compile errors
* fixed v_store(short*, v_float16&) signatures
* trying to fix the test failures on Linux
* fixed some issues found by alalek
* restored IPP optimization after the patch with AVX wide intrinsics has been properly tested
* restored IPP optimization after the patch with AVX wide intrinsics has been properly tested
6 years ago
Alexander Alekhin
3c74fde349
core: eliminate 'if' logic from Matx::inv()/solve()
...
- 'if' logic is moved into templates.
- removed unnecessary cv::Mat objects creation.
- fixed inv() test (invA * A == eye)
- added more Matx tests to cover all defined template specializations
6 years ago
Alexander Alekhin
33b7028be2
core: use "explicit" for Matx() ctor
6 years ago
Vitaly Tuzov
850a8577b2
Fixed unreachable code warnings for Matx::solve()
6 years ago