Everton Constantino
75315fb297
Merge pull request #15494 from everton1984:hal_vector_get_n
...
Improving VSX performance of integral function
* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.
* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.
* Adding SSE/NEON/AVX intrinsics.
* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.
* core(simd): add tests for v_extract_n/v_broadcast_element
- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations
* core(simd): fix compilation
- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only
* cleanup
5 years ago
clunietp
2185bce4b7
Fix 13577
5 years ago
Alexander Alekhin
f4d55d512f
imgproc: fix bit-exact GaussianBlur() / sepFilter2D() ( #15855 )
...
* imgproc: fix bit-exact GaussianBlur() / sepFilter2D()
- avoid kernels with bad approximation
- GaussiabBlur - apply error-diffusion approximation for kernel (8-bit fraction)
* java(test): update features2d ref data
* test: update test_facedetect
5 years ago
ChipKerchner
1d33335e33
Convert demosiacing with variable number of gradients to HAL - 5.5x faster
5 years ago
Alexander Alekhin
763b80d5fa
imgproc(IPP): disable ippiDistanceTransform_3x3_8u32f_C1R
5 years ago
Alexander Alekhin
7ecdcf6ca6
build: GCC9 compilation
5 years ago
Chip Kerchner
2112aa31e6
Merge pull request #15828 from ChipKerchner:momentsToHal
...
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).
* Adding NEON code back in for non 64-bit platforms.
* Remove floats from post processing.
5 years ago
Ciprian Alexandru Pitis
d2e02779c4
Merge pull request #15799 from Cpitis:feature/parallelization
...
Parallelize pyrDown & calcSharrDeriv
* ::pyrDown has been parallelized
* CalcSharrDeriv parallelized
* Fixed whitespace
* Set granularity based on amount of threads enabled
* Granularity changed to cv::getNumThreads, now each thread should receive 1/n sized stripes
* imgproc: move PyrDownInvoker<CastOp>::operator() implementation
* imgproc(pyramid): remove syloopboundary()
* video: SharrDerivInvoker replace 'Mat*' => 'Mat&' fields
5 years ago
Alexander Alekhin
17e2bf5717
core(tls): implement releasing of TLS on thread termination
...
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method
(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
5 years ago
ChipKerchner
c46f119e0e
Convert demosaic functions to HAL
5 years ago
Steve Nicholson
acb3b3bd4d
Add documentation and example program for intersectConvexConvex
5 years ago
jasjuang
4c7db02925
document CC_STAT_MAX in ConnectedComponentsTypes
5 years ago
Everton Constantino
9ca9249992
Merge pull request #15527 from everton1984:faster_acc
...
* Adding support for vectorized masking for uchar/ushort.
* Fixing bug where mask was zeroing the dst. Improved the way to calculate
the mask and tweaked for further performance improvements.
* Fixing mask comparison test.
* Restricting to one channel.
* Adding support for 3 channels, switch old approach to start using HAL's
v_select.
5 years ago
Alexander Alekhin
a007220c52
imgproc: update histogram test
5 years ago
Alexander Alekhin
f301f17b61
imgproc: accurate histogram value thresholding
5 years ago
Alexander Alekhin
c69245da1f
imgproc: fix fitLine() implementation
...
- update optimal solutions on each iteration
5 years ago
Alexander Alekhin
f81e401cd0
imgproc: fix indexing issue in pyramids
...
UBSAN violation expression: 'tab = tabR - x;'
5 years ago
Vitaly Tuzov
1c17b3281a
Fixed OOB reading in pyrDown
5 years ago
Vitaly Tuzov
7b3a752012
Fixed universal intrinsic undistort() implementation
5 years ago
Alexander Alekhin
e7b6753a10
imgproc: avoid manual memory allocation in connectedcomponents.cpp
5 years ago
Everton Constantino
76e403cf25
Merge pull request #15440 from everton1984:new_integral_tests
...
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.
* Disabling full tests by default.
* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.
* Giving proper names, making array static.
5 years ago
atinfinity
3b9f981358
removed tegra optimization
5 years ago
Chip Kerchner
26228e6b4d
Merge pull request #15358 from ChipKerchner:imgwarpToHal
...
* Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and 15% speedup on x86
* Change compile flag from CV_SIMD128 to CV_SIMD128_64F for use of v_float64x2 type
* Changing WarpPerspectiveLine from class functions and dispatching to static functions.
* Re-add dynamic runtime and dispatch execution.
* RRestore SSE4_1 optimizations inside opt_SSE4_1 namespace
5 years ago
atinfinity
824465ea27
Merge pull request #15388 from atinfinity:impl-turbo-colormap
...
Implementation of colormap "Turbo" (#15388 )
* implemented turbo colormap
* add colormap image
* changed float value to avoid cast
* sorted flag check alphabetically
5 years ago
Alexander Alekhin
29dbeb253c
build: fix build with ICC
5 years ago
luz.paz
fcc7d8dd4e
Fix modules/ typos
...
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
backporting of commit: ec43292e1e
5 years ago
luz.paz
ec43292e1e
Fix modules/ typos
...
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
5 years ago
Alexander Alekhin
32772a5436
3.4: backported changes from 'master' branch
5 years ago
Maksim Shabunin
6d5ac67681
Restored IPP call reduction
5 years ago
dcouwenh
d3cf0d2c06
Bayer VNG Demosaicing Fix #2 (Merge pull request #15086 )
...
* Update demosaicing.cpp
Fixed calculation of Bs for non-green pixels.
* Fixed cvtColor perf test for bayer VNG
5 years ago
Vitaly Tuzov
e0f8bb83a6
Merge pull request #14994 from terfendail:wintr_undistort
...
WUI based implementation to initUndistortRectifyMap (#14994 )
* Add initUndistortRectifyMap performance test
* Move cv namespace boundaries
* Add wide universal intrinsics based implementation to initUndistortRectifyMap
* Dispatch undistort
5 years ago
Chip Kerchner
c9fcc12e3b
Merge pull request #15048 from ChipKerchner:reduceStoreGatheringThreshold
...
* Reduce store gathering pressures - speeds thresholds by up to 20%
* Rename temporary histogram array and initialize so that MACOSX builder is happy
5 years ago
Vitaly Tuzov
894ad33bf4
Fix pixel value evaluation overflow in bit-exact GaussianBlur implementation
5 years ago
Alexander Alekhin
32c6e58bdb
imgproc: fix unaligned memory access
...
may cause crashes on ARM platform
5 years ago
Tomoaki Teshima
594a95839c
fix test failure of OCL_ImgProc/CvtColor8u.mRGBA2RGBA
6 years ago
Vitaly Tuzov
82e5b961d3
Fixed initUndistortRectifyMap AVX2 implementation
6 years ago
arnaudbrejeon
a37201abee
Fix crash, add assert and test
6 years ago
Vitaly Tuzov
9befb7a1d7
Merge pull request #14916 from terfendail:wsignmask_deprecated
...
* Avoid using v_signmask universal intrinsic and mark it as deprecated
* Renamed v_find_negative to v_scan_forward
6 years ago
StefanBruens
3e4a195b61
Merge pull request #14936 from StefanBruens:crosscorr_cleanup
...
Crosscorr cleanup (#14936 )
* Simplify code for convolution destination type/size
For the 2d filter code, destination size equals source size, and the
crossCorr function even (re-)creates the output matrix with the given size.
The number of channels also have to match. The destination type() is the
one used to create the output matrix, so we can use its type() here.
This is a preparatory patch.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
* Remove redundant destination size and type parameters from crossCorr
All calling sites of crossCorr already use (...,
mat, mat.size(), mat.type(), ...), so the parameters are redundant.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
6 years ago
Alexander Alekhin
4a6888ccf6
imgproc: fix kmeans() call from grabCut()
6 years ago
Alexander Alekhin
5ac55fc132
core: eliminate AVX512 build warnings
...
from MSVS2017 and GCC8 -O1 mode
6 years ago
Kang
549c53121a
fix the bug, when k[4] is negative, icdist may be negative at the edge of image.
6 years ago
Vitaly Tuzov
d2aadabc5e
Merge pull request #14743 from terfendail:wui512_fixvswarn
...
Fix for MSVS2019 build warnings (#14743 )
* AVX512 arch support for MSVS
* Fix for MSVS2019 build warnings: updated integral() AVX512 implementation
* Fix for MSVS2019 build warnings: reworked v_rotate_right AVX512 implementation
* fix indentation
6 years ago
Alexander Alekhin
1e9ad5476d
core(intrin): drop hasSIMD128 checks
...
- use compile-time checks instead (`#if CV_SIMD128`)
- runtime checks are useless
6 years ago
bommo1
a38157a1f4
Fix https://github.com/opencv/opencv/issues/14265
6 years ago
Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
6 years ago
Rostislav Vasilikhin
8c698262ea
rgb2hls_b: out of bounds read fixed
6 years ago
Rostislav Vasilikhin
791ebd05fc
out of bounds read fixed in rgb2luv_b
6 years ago
Rostislav Vasilikhin
e07ffe902e
Merge pull request #14616 from savuor:hsv_wide
...
HSV and HLS color conversions rewritten to wide intrinsics (#14616 )
* RGB2HSV_b vectorized
* RGB2HSV_f: widen
* RGB2HSV_f: shorten, more intuitive
* HSV2RGB_f and HSV2RGB_b widen
* hls2rgb_f widen
* instrumentation instead vx_cleanup
* RGB2HLS_f widen
* RGB2HLS_b rewritten to wide universal intrinsics
* define guard against no SIMD code
* hls2rgb_b rewritten
* extra define removed
* warning fixed
* hls2rgb_b: performance fixed
6 years ago
Ahmed Ashour
f3319f6140
java: remove redundant declaration of java.lang package
6 years ago