Alexander Alekhin
5ac55fc132
core: eliminate AVX512 build warnings
...
from MSVS2017 and GCC8 -O1 mode
6 years ago
Alexander Alekhin
681e0323f2
core: backport toLowerCase()/toUpperCase()
6 years ago
Vitaly Tuzov
a29e59a770
Rename parameters in AVX512 implementation of v_load_deinterleave and v_store_interleave
6 years ago
Vitaly Tuzov
d2aadabc5e
Merge pull request #14743 from terfendail:wui512_fixvswarn
...
Fix for MSVS2019 build warnings (#14743 )
* AVX512 arch support for MSVS
* Fix for MSVS2019 build warnings: updated integral() AVX512 implementation
* Fix for MSVS2019 build warnings: reworked v_rotate_right AVX512 implementation
* fix indentation
6 years ago
Alexander Alekhin
1e9ad5476d
core(intrin): drop hasSIMD128 checks
...
- use compile-time checks instead (`#if CV_SIMD128`)
- runtime checks are useless
6 years ago
Alexander Alekhin
4a8fd71a2e
core: fix visibility handling
6 years ago
Ahmed Ashour
5c56b8ce92
java: generated code to have javadoc
6 years ago
Ahmed Ashour
1aca1d582e
Fix some typos
6 years ago
Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
6 years ago
Vitaly Tuzov
723165f878
fix for AVX2 version of v_reduce_min intrinsic
6 years ago
Vitaly Tuzov
f0fb91f2d4
Fixed v_signmask implementation for AVX2, updated universal intrinsics tests.
6 years ago
Alexander Alekhin
9340af1a8a
core: Async API / AsyncArray
6 years ago
catree
b5e2ec4ea4
Fix typo in NormTypes documentation.
6 years ago
Vitaly Tuzov
7a55f2af3b
Updated AVX2 implementation of v_popcount for u8.
6 years ago
Vitaly Tuzov
1220dd4877
Updated v_popcount description, reference implementation and test.
6 years ago
Vitaly Tuzov
96ab78dc4f
Reworked v_popcount implementation to provide number of bits in a single lane
6 years ago
Sayed Adel
5a77f4cee3
Merge pull request #14007 from seiko2plus:core_avx512_infa
...
* core: improve AVX512 infrastructure by adding more CPU features groups
* cmake: use groups for AVX512 optimization flags
* core: remove gap in CPU flags enumeration
* cmake: restore default CPU_DISPATCH
6 years ago
Sayed Adel
afb157df67
core:vsx fix sum of v_reduce_sad
6 years ago
Vitaly Tuzov
18d10d6b86
Fixed v_reduce_sad intrinsics implementation and added tests
6 years ago
Alexander Alekhin
c1981f28ad
build: +OPENCV_ENABLE_MEMORY_SANITIZER flag
6 years ago
Vitaly Tuzov
4a54aa3fbd
Cleared up deprecated intrinsics for FP16
6 years ago
Alexander Alekhin
b38de57f9a
ts: test tags for flexible/reliable tests filtering
...
- added functionality to collect memory usage of OpenCL sybsystem
- memory usage of fastMalloc() (disabled by default):
* It is not accurate sometimes - external memory profiler is required.
- specify common `CV_TEST_TAG_` macros
- added applyTestTag() function
- write memory usage / enabled tags into Google Tests output file (.xml)
6 years ago
Alexander Alekhin
33b765d797
OpenCV version++ (3.4.6)
...
OpenCV 3.4.6
6 years ago
Alexander Alekhin
d6b82dcd65
Merge pull request #14162 from alalek:eliminate_coverity_scan_issues
...
core: eliminate coverity scan issues (#14162 )
* core(hal): avoid using of r,g,b,a parameters in interleave/deinterleave
- static analysis tools blame on possible parameters reordering
- align AVX parameters with corresponding SSE/NEO/VSX/cpp code
* core: avoid "i,j" parameters in Matx methods
- static analysis tools blame on possible parameters reordering
* core: resolve coverity scan issues
6 years ago
Alexander Alekhin
6686559c70
ocl: define CL_SILENCE_DEPRECATION on MacOSX
6 years ago
Maksim Shabunin
41da3ef1d2
Fixed cvdef.h for MSVC C users
6 years ago
Sayed Adel
f41359688b
core:vsx Add support for VSX3 half precision conversions
6 years ago
Sayed Adel
4fe2d9bdbc
core:vsx Several improvements(3)
...
* optimize v_lut_deinterleave
* optimize v_interleave_/pairs/quads/triplets
* optimize v_lut, use vec_extract instead of aligned store
6 years ago
Sayed Adel
872e7894b4
core:vsx working around gcc aligned memory access bug
...
- allow cmake to check sanity of vsx aligned ld/st
- force universal intrinsics v_load_aligned/v_store_aligned
to failback to unaligned ld/st if cmake runtime vsx aligned test fail
6 years ago
Alexander Alekhin
80e5642ca2
pre: OpenCV 3.4.6 (version++)
6 years ago
Alexander Alekhin
842c58a7d6
core(intrin): NEON v_load_expand_q() support unaligned addr
6 years ago
Alexander Alekhin
8b541e450b
imgproc: dispatch color*
...
Lab/XYZ modes have been postponed (color_lab.cpp):
- need to split code for tables initialization and for pixels processing first
- no significant performance improvements for switching between SSE42 / AVX2 code generation
6 years ago
Sayed Adel
5478165e16
core:vsx Fix narrowing warning on vector splats
6 years ago
berak
20afae5a14
core: fix mat matx multiplication
6 years ago
Vitaly Tuzov
9548093b46
Horizontal line processing for pyrDown() reworked using wide universal intrinsics.
6 years ago
Vitaly Tuzov
334c4d62b5
Merge pull request #13781 from terfendail:warp_wintr
...
Resize reworked using wide universal intrinsics (#13781 )
* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize
* Reworked linear resize using new wide LUT intrinsics
* Fix for VSX intrinsics
6 years ago
Alexander Alekhin
cd66f6e3db
core: dispatch matmul
...
- gemm: keep baseline only (lapack is 10x+ faster, lets reduce binary size)
- transform / distTransform
- scaleAdd (32f/64f only)
- Mahalanobis: keep baseline only (no perf tests)
- mulTransposed: keep baseline only (no perf tests)
- dot
6 years ago
klemens
5d9c6723ee
spelling fixes
...
backport 997b7b18af
6 years ago
Namgoo Lee
fb8e652c3f
Add CV_16UC1 support for cuda::CLAHE
...
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
6 years ago
Alexander Alekhin
dc5e69b4d4
Revert "Merge pull request #13586 from eightco:Core_bugfix3"
...
This reverts commit 3721c8bb06
except changes in modules/dnn/test/test_tf_importer.cpp
6 years ago
Lee Jaehwan
3721c8bb06
Merge pull request #13586 from eightco:Core_bugfix3
...
* Add Operator override for multi-channel Mat with literal constant.
* simple test
* Operator overloading channel constraint for primitive types
* fix some test for #13586
6 years ago
Vitaly Tuzov
ea882d58c6
Added CV_ALWAYS_INLINE macro
6 years ago
Lucas Towers
9cc12ff0ac
Fix improper defining of CV_XADD when using Intel C++
6 years ago
Namgoo Lee
4b4874e67a
Remove build warning msg with CUDA10.0
6 years ago
Vitaly Tuzov
c8f59bf1e0
Fixed operations on Mat and Matx simultaneously
6 years ago
Alexander Alekhin
8f1356c3c5
OpenCV version++ (3.4.5)
...
OpenCV 3.4.5
6 years ago
Vitaly Tuzov
06f32e3b3e
Reworked separable filter to use wide universal intrinsics
6 years ago
Alexander Alekhin
f605898bae
core: fix eigen2cv() - don't change fixed type of 'dst'
6 years ago
Sayed Adel
4e16ae9a1f
core:vsx fix build failure on GCC<=6 due implementation of v_reduce_sum(v_float64x2)
6 years ago
Vitaly Tuzov
3903174f7c
Merge pull request #13334 from terfendail:histogram_wintr
...
* added performance test for compareHist
* compareHist reworked to use wide universal intrinsics
* Disabled vectorization for CV_COMP_CORREL and CV_COMP_BHATTACHARYYA if f64 is unsupported
6 years ago