Chip Kerchner
027769bf5d
Merge pull request #15662 from ChipKerchner:addVReverseIntrinsic
...
* New v_reverse HAL intrinsic for reversing the ordering of a vector
* Fix conflict.
* Try to resolve conflict again.
* Try one more time.
* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.
* Change v_uint16x8 SSE2 version to use shuffles
5 years ago
Sayed Adel
f2fe6f40c2
Merge pull request #15510 from seiko2plus:issue15506
...
* core: rework and optimize SIMD implementation of dotProd
- add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
- add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
- fix clang build on ppc64le
- support wide universal intrinsics for dotProd_32s
- remove raw SIMD and activate universal intrinsics for dotProd_8
- implement SIMD optimization for dotProd_s16&u16
- extend performance test data types of dotprod
- fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
- optimize v_mul_expand(int32) on VSX
* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast
this changes made depend on "terfendail" review
5 years ago
Alexander Alekhin
fcc69d5a60
core(test): fix check conditions
5 years ago
Vitaly Tuzov
d134ec54c5
Extend tests for v_check_any and v_check_all intrinsics
5 years ago
luz.paz
fcc7d8dd4e
Fix modules/ typos
...
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
backporting of commit: ec43292e1e
5 years ago
Alexander Alekhin
32772a5436
3.4: backported changes from 'master' branch
5 years ago
Alexander Alekhin
15b8a8d935
build: eliminate warnings with Xcode 10.3
5 years ago
Paul E. Murphy
b2135be594
fast_math: add extra perf/unit tests
...
Add a basic sanity test to verify the rounding functions
work as expected.
Likewise, extend the rounding performance test to cover the
additional float -> int fast math functions.
5 years ago
Alexander Alekhin
4ea8526e9f
core(persistence): fix writeRaw() / readRaw() struct support
...
- writeRaw(): support structs
- readRaw(): 'len' is buffer limit in bytes (documentation is fixed)
5 years ago
Alexander Alekhin
c3b838b738
core(persistence): struct storage layout without alignment gaps
5 years ago
Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
6 years ago
Vitaly Tuzov
723165f878
fix for AVX2 version of v_reduce_min intrinsic
6 years ago
Vitaly Tuzov
f0fb91f2d4
Fixed v_signmask implementation for AVX2, updated universal intrinsics tests.
6 years ago
Alexander Alekhin
9340af1a8a
core: Async API / AsyncArray
6 years ago
Vitaly Tuzov
1220dd4877
Updated v_popcount description, reference implementation and test.
6 years ago
Vitaly Tuzov
18d10d6b86
Fixed v_reduce_sad intrinsics implementation and added tests
6 years ago
Alexander Alekhin
d6b82dcd65
Merge pull request #14162 from alalek:eliminate_coverity_scan_issues
...
core: eliminate coverity scan issues (#14162 )
* core(hal): avoid using of r,g,b,a parameters in interleave/deinterleave
- static analysis tools blame on possible parameters reordering
- align AVX parameters with corresponding SSE/NEO/VSX/cpp code
* core: avoid "i,j" parameters in Matx methods
- static analysis tools blame on possible parameters reordering
* core: resolve coverity scan issues
6 years ago
Alexander Alekhin
93a402d0f2
core: fix Core_EigenNonSymmetric.convergence test
6 years ago
Alexander Alekhin
a7c4ee9ae1
core: add iterations limit check in eigenNonSymmetric()
6 years ago
Alexander Alekhin
7366eebebb
core: fix condition in OutputArray::create(allowTransposed=True)
6 years ago
berak
20afae5a14
core: fix mat matx multiplication
6 years ago
Alexander Alekhin
dc5e69b4d4
Revert "Merge pull request #13586 from eightco:Core_bugfix3"
...
This reverts commit 3721c8bb06
except changes in modules/dnn/test/test_tf_importer.cpp
6 years ago
Lee Jaehwan
3721c8bb06
Merge pull request #13586 from eightco:Core_bugfix3
...
* Add Operator override for multi-channel Mat with literal constant.
* simple test
* Operator overloading channel constraint for primitive types
* fix some test for #13586
6 years ago
Lee Jaehwan
71aee662bd
Merge pull request #13544 from eightco:bugfix
...
Fix a bug in cv :: merge when array of 3-channel mat is input (#13544 )
* Mat merge function bug fix - Bug fix of merge function of 3-channel vector <Mat> of 3 or 4 matrices
* Add Core_merge test for opencv#13544
* fixups
6 years ago
Vitaly Tuzov
cd169941f2
Added test for addition of Mat and Matx
6 years ago
Alexander Alekhin
f605898bae
core: fix eigen2cv() - don't change fixed type of 'dst'
6 years ago
1over
b6367f5821
fixed operator- for Rect
6 years ago
Alexander Alekhin
2fa9bd221d
core: add utils::findDataFile() / samples::findFile()
6 years ago
Dmitry Kurtaev
6c76c8f881
Add a test for FileNode::keys()
6 years ago
Alexander Alekhin
96ee83898d
core(test): extend divideByZero test
...
to verify SIMD code path
6 years ago
Alexander Alekhin
5059523937
core: fix processing of vector-rows
6 years ago
Sayed Adel
93ffebc273
core: reimplement SIMD arithmetic, logic and comparison operations into wide universal intrinsics
...
- initialize arithmetic dispatcher
- add new universal intrinsic v_absdiffs
- add new universal intrinsic v_pack_b
- add accumulate version of universal intrinsic v_round
- fix sse/avx2:uint8 multiplication overflow
- reimplement arithmetic, logic and comparison operations into wide universal intrinsics
with full support for all types
- reimplement IPP arithmetic, logic and comparison operations in a sperate file arithm_ipp.hpp
- avoid scalar multiplication if scaling factor eq 1 and use integer multiplication
- move C arithmetic operations to precomp.hpp and delete [arithm_simd|arithm_core].hpp
- add compatibility with new opencv4 divide policy
6 years ago
maver1
e397434cb6
Merge pull request #12877 from maver1:3.4
...
* Updated ICV packages and IPP integration
* core(test): minMaxIdx IPP regression test
* core(ipp): workaround minMaxIdx problem
* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun
* Returned semicolon after CV_INSTRUMENT_REGION_IPP()
6 years ago
Michał Janiszewski
c8e6ce304f
Catch exceptions by const-reference
...
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
6 years ago
Alexander Alekhin
5677a683a5
core(test): zero values divide test (3.x)
6 years ago
Sayed Adel
5771fd693d
Change behaviour of 16-bit multiply operator
...
- redefine 16-bit multiply operator to perform saturating multiply
instead of non-saturating multiply
- implement 8-bit multiply operator to perform saturating multiply
- implement v_mul_wrap() for 8-bit, 16-bit non-saturating multiply
- improve performance of v_mul_hi() for VSX
- update intrin tests with new changes
- replace unv 16-bit multiplication operator with v_mul_wrap due behavior changes
- Several improvements depend on vpisarev review
* initial forward declarations for universal intrinsics
* move emulating SSE intrinsics into separate file
* implement v_mul_expand for 8-bit
* reimplement saturating multiply using v_mul_expand + v_pack
* map v_expand, v_load_expand, v_load_expand_q to sse4.1
* fix overflow avx2::v_pack(uint32)
* implement two universal intrinsics v_expand_low and v_expand_high
6 years ago
Vitaly Tuzov
1ff11c84ab
Fixed meanStdDev() implementation for the case input matrix has more than 4 channels
6 years ago
Alexander Alekhin
48e8e76a34
fix build warnings
6 years ago
Dmitry Kurtaev
24ab751547
Merge pull request #12565 from dkurt:dnn_non_intel_gpu
...
* Remove isIntel check from deep learning layers
* Remove fp16->fp32 fallbacks where it's not necessary
* Fix Kernel::run to prevent localsize > globalsize
6 years ago
Vitaly Tuzov
2f929376ec
Fixed meanStdDev() implementation for the case input matrix has more than 4 channels
6 years ago
Hamdi Sahloul
a39e0daacf
Utilize CV_UNUSED macro
6 years ago
Vadim Pisarevsky
80b62a41c6
Merge pull request #12411 from vpisarev:wide_convert
...
* rewrote Mat::convertTo() and convertScaleAbs() to wide universal intrinsics; added always-available and SIMD-optimized FP16<=>FP32 conversion
* fixed compile warnings
* fix some more compile errors
* slightly relaxed accuracy threshold for int->float conversion (since we now do it using single-precision arithmetics, not double-precision)
* fixed compile errors on iOS, Android and in the baseline C++ version (intrin_cpp.hpp)
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
6 years ago
Alexander Alekhin
8a3c394d6a
don't use constructors for C API structures
6 years ago
Alexander Alekhin
a0f86479e0
core: wrap custom types via _RawArray (raw() call)
...
- support passing of `std::vector<KeyPoint>` via InputArray
6 years ago
Alexander Alekhin
70a27c7dd6
core: add solveLP type checks for output
...
to forbid Mat1f
Checks are not reliable: empty uninitialized `cv::Mat` has `CV_8UC1` type
6 years ago
Alexander Alekhin
e86287d8ae
cleanup: IPP Async (IPP_A)
...
except header file with conversion routines (will be removed in OpenCV 4.0)
6 years ago
Alexander Alekhin
67d46dfc6c
core(intrin): restrict FP16 operations
...
Intrinsics must be effective, so don't declare FP16 type/operations if there is no native support.
- CV_FP16: supports load/store into/from float32
- CV_SIMD_FP16: declares FP16 types and native FP16 operations
6 years ago
Alexander Alekhin
7453a6938a
core(test): extra tests/fixes for merge/split ( #12171 )
...
* core(test): merge hang test
* core(merge/split): fix intrin optimization
6 years ago
Alexander Alekhin
f2e1710dd5
core(test): regression test for 12121
6 years ago
Alexander Alekhin
3f302cabb8
core(test): intrinsic tests for all dispatched CPU optimizations
...
- tests for both SIMD128 / SIMD256
- different dispatched + baseline(SIMD128) intrinsics
6 years ago