* seriously improved performance of blur function, especially 3x3 and 5x5 cases
* trying to fix warnings and test failures
* replaced #if 0 with #if IPP_DISABLE_BLOCK
* Improve Canny by using _mm_movemask_epi8 to find next pixel magnitude greater than lower threshold. Added parallelized finalPass to Canny with variable gradients. Little changes in finalPass.
* Some things fixed
* use universal intrinsic for accumulate series using float/double
* accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
* add v_cvt_f64_high in both SSE/NEON
* add test for conversion v_cvt_f64_high in test_intrin.cpp
* improve some existing universal intrinsic by using new instructions in Aarch64
* add workaround for Android build in intrin_neon.hpp
* Add Grana's connected components algorithm for 8-way connectivity. That algorithm is faster than Wu's one (currently implemented in opencv). For more details see https://github.com/prittt/YACCLAB.
* New functions signature and distance transform compatibility
* Add tests to imgproc/test/test_connectedcomponents.cpp
* Change of test_connectedcomponents.cpp for c++98 support
There is an issue with processing of abs(short) function for
negative argument.
Affected OpenCL devices:
- iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 )
- CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))
* Common Canny parallelization added. TBB and single thread code removed. Final pass vectorized with SSE2 intrinsics.
* wrong #ifdef replaced with #if
* Merged to actual Canny version
* Merged common parallelized Canny with actual Canny implementation
* Remove 'Mutex *mutex' and pass 'Mutex mutex' from outside to parallelCanny
* Replaced extern Mutex with intern mutable Mutex.