Alexander Alekhin
1e9ad5476d
core(intrin): drop hasSIMD128 checks
...
- use compile-time checks instead (`#if CV_SIMD128`)
- runtime checks are useless
6 years ago
bommo1
a38157a1f4
Fix https://github.com/opencv/opencv/issues/14265
6 years ago
Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
6 years ago
Rostislav Vasilikhin
8c698262ea
rgb2hls_b: out of bounds read fixed
6 years ago
Rostislav Vasilikhin
791ebd05fc
out of bounds read fixed in rgb2luv_b
6 years ago
Rostislav Vasilikhin
e07ffe902e
Merge pull request #14616 from savuor:hsv_wide
...
HSV and HLS color conversions rewritten to wide intrinsics (#14616 )
* RGB2HSV_b vectorized
* RGB2HSV_f: widen
* RGB2HSV_f: shorten, more intuitive
* HSV2RGB_f and HSV2RGB_b widen
* hls2rgb_f widen
* instrumentation instead vx_cleanup
* RGB2HLS_f widen
* RGB2HLS_b rewritten to wide universal intrinsics
* define guard against no SIMD code
* hls2rgb_b rewritten
* extra define removed
* warning fixed
* hls2rgb_b: performance fixed
6 years ago
Ahmed Ashour
f3319f6140
java: remove redundant declaration of java.lang package
6 years ago
catree
7ed858e38e
Fix issue with solvePnPRansac and Nx3 1-channel input when the number of points is 5. Try to uniform the input shape of projectPoints and undistortPoints.
6 years ago
Rostislav Vasilikhin
e90e0ef9aa
Merge pull request #14106 from savuor:lab_wide
...
Lab, Luv and XYZ conversions rewritten to wide intrinsics (#14106 )
* rgb2xyz<float> re-vectorized
* rgb2xyz_i vectorized for ushort and uchar
* xyz2rgb<float> vectorized
* xyz2rgb_i vectorized for both uchar and ushort
* intermediate conversions (int->float) rewritten
* packed rgb2luv rewritten
* (some) float conversions rewritten
* burnt volatile int _3 and similar
* RGB2Lab_b rewritten
* tests: logging made better
* RGB2Lab_f (LRGB path) rewritten
* Lab2RGBfloat rewritten
* Lab2RGBinteger and Lab2RGB_b rewritten to wide universal intrinsics
* Luv2RGBinteger wide vectorized
* RGB2Lab_b fixed: v_sub_wrap instead of saturated sub
* warnings fixed
* trying to fix compilation on older compilers
* using 16x8 registers for 8-element dot product
* cleanup added
* splineInterpolate: loop unrolled, perf fix for f32x4
* Lab2RGBfloat: grab 2x more data to process on f32x4
* nrepeats for Luv2RGBfloat, +20% perf
* minor
* nrepeats to RGB2Lab_f
* Lab2RGBinteger: no tab for linear BGR
* nrepeats for RGB2Luvfloat
* Luv2RGBinteger: no tab for linear RGB
* +10% more to perf of Luv2RGBfloat
* nrepeats for 256-simd for Lab2RGBfloat
* less warnings
* BOM removed
* CV_SIMD_WIDTH used for lanes number checking
* trilinearPackedInterpolate: 128-bit specialization added
* fix build; no vx_cleanup(), instrumentation instead
6 years ago
Thang Tran
1aff378ae8
imgproc: fixed bug from intersectConvexConvex
...
Added checks for all of vertices from each contour instead of checking
only for the first vertex.
6 years ago
Alexander Alekhin
1c180f4c7f
imgproc: fix RemoveOverlaps() with empty input vector
6 years ago
Suleyman TURKMEN
3f9343e238
Update imgproc.hpp
6 years ago
Brad Kelly
0fe17eeb68
Implementing AVX512 Support for 1 channel mats for CV_64F format
6 years ago
Alexander Alekhin
8c8715c4dd
fix static analysis issues
6 years ago
Alexander Alekhin
2c07c6718f
imgproc: dispatch morph
6 years ago
Alexander Alekhin
5a01227aa1
imgproc: dispatch box_filter
6 years ago
Alexander Alekhin
ce3c92eb1f
imgproc: dispatch bilateral_filter
6 years ago
Alexander Alekhin
b99c9145bf
imgproc: dispatch smooth
6 years ago
Alexander Alekhin
6ec08f268f
imgproc: dispatch medianBlur
6 years ago
Alexander Alekhin
8546ac3ce6
imgproc: get rid of filter.avx2.cpp
6 years ago
Alexander Alekhin
9a8dbfd57f
imgproc: dispatch filter.cpp
6 years ago
Alexander Alekhin
9dc7554089
imgproc: copy .dispatch.cpp
6 years ago
Alexander Alekhin
6eac8f78b9
imgproc: copy .simd.hpp
6 years ago
Alexander Alekhin
8b541e450b
imgproc: dispatch color*
...
Lab/XYZ modes have been postponed (color_lab.cpp):
- need to split code for tables initialization and for pixels processing first
- no significant performance improvements for switching between SSE42 / AVX2 code generation
6 years ago
Alexander Alekhin
f26912960f
imgproc: clone color*.dispatch.cpp
6 years ago
Alexander Alekhin
db588bb831
imgproc: clone color*.simd.hpp
6 years ago
Alexander Alekhin
d5a2fe5180
perf: ignore _ovx tests
6 years ago
Vitaly Tuzov
99b39aa5bd
Fixed out of bound reading in LINEAR_EXACT resize for 8UC3
6 years ago
Suleyman TURKMEN
3d1dbd2ccd
clean up C API
6 years ago
Alexander Alekhin
3ba49ccecc
imgproc: removed LSD code due original code license conflict
6 years ago
Vitaly Tuzov
9548093b46
Horizontal line processing for pyrDown() reworked using wide universal intrinsics.
6 years ago
Vitaly Tuzov
334c4d62b5
Merge pull request #13781 from terfendail:warp_wintr
...
Resize reworked using wide universal intrinsics (#13781 )
* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize
* Reworked linear resize using new wide LUT intrinsics
* Fix for VSX intrinsics
6 years ago
Brad Kelly
507f8add1c
Implementing AVX512 Support for 2 and 4 channel mats for CV_64F format
6 years ago
Rostislav Vasilikhin
4e679e1cc5
disabled 16u and 32f perf tests
6 years ago
Rostislav Vasilikhin
87f651c119
disabled sanity check for 32f
6 years ago
Vitaly Tuzov
07c10d6fc3
Fixed out of bound reading issue in erode() and dilate()
6 years ago
Namgoo Lee
fb8e652c3f
Add CV_16UC1 support for cuda::CLAHE
...
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
6 years ago
Rostislav Vasilikhin
bbedebb57c
perf tests for cvtColor for 16U and 32f added
6 years ago
Rostislav Vasilikhin
554eae56d1
Merge pull request #13708 from savuor:yuv42x_wide
...
YUV42x color conversions rewritten to wide intrinsics (#13708 )
* a*b+c -> fma
* YUV420sp2RGB initially vectorized
* shorter var names
* loops by 4
* yuv420p2rgb vectorized
* yuv422toRGB vectorized
* reg arrays
* rgb2yuv420 vectorized
* warnings fixed
* try to fix align error
6 years ago
Vitaly Tuzov
2f5af1bd33
Merge pull request #13693 from terfendail:spatialgrad_wintr
...
* spatialGradient() reworked to use wide universal intrinsics
* Moved row pointers inside loops
6 years ago
Alexander Alekhin
5916ebf500
Merge pull request #13679 from alalek:imgproc_median_blur_cleanup
...
* imgproc: cleanup medianBlur_8u_O1 code
Unnecessary per-channel buffers: H[c] / lut[c]
* imgproc(medianBlur_8u_O1): use CV_SIMD_WIDTH for alignment
6 years ago
Arnaud Brejeon
d998e70a25
Merge pull request #13672 from arnaudbrejeon:bug_fix_12961
...
PyrDown: Fix bug #12961 (#13672 )
* Force unaligned pointer and create test
* More cross-platform solution
* MSVC expects a proper order
* Remove useless clang macro
6 years ago
Vitaly Tuzov
ed2e1af3e8
Added performance test for blendLinear
6 years ago
Vitaly Tuzov
266725a378
blendLinear() reworked to use wide universal intrinsics
6 years ago
Rostislav Vasilikhin
74ba4b7ae2
fixed (un)signed packing s16 -> u8
6 years ago
Alexander Alekhin
a84e11451b
imgproc(test): RGB2YUV regression test
6 years ago
Rostislav Vasilikhin
3812ae7949
Merge pull request #13649 from savuor:yuv_wide
...
YUV/YCrCb conversions rewritten to wide intrinsics (#13649 )
* YUV: minors
* YUV42x conversions template-merged
* more template-merged YUV42x conversions; some NEON code removed
* rgb2yuv<float> vectorized
* yuv2rgb<float> vectorized
* memcpy removed
* Yuv2RGB<ushort> vectorized
* unused code removed
* rgb2yuv<ushort> vectorized
* rgb2yuv<uchar> vectorized
* v_pack_u used (up to +30% perf)
* yuv2rgb<uchar> vectorized
* fixed compilation
6 years ago
Vitaly Tuzov
a84bbc62b1
boundingRect() reworked to use wide universal intrinsics
6 years ago
Vitaly Tuzov
78f80c35d2
Performance test for bounding rect estimation
6 years ago
Vitaly Tuzov
a202dc9a90
threshold() reworked to use wide universal intrinsics
6 years ago