* another round of dnn optimization:
* increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly
* improved SIMD optimization of pooling layer, optimized average pooling
* cleaned up convolution layer implementation
* made activation layer "attacheable" to all other layers, including fully connected and addition layer.
* fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology.
* greatly optimized permutation layer, which improved SSD performance
* parallelized element-wise binary/ternary/... ops (sum, prod, max)
* also, added missing copyrights to many of the layer implementation files
* temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders
Parallelize Canny with custom gradient (#8694)
* New Canny implementation. Restructuring code in parallelCanny class. Align mag buffer and map.
* Fix warnings.
* Missing SIMD check added.
* Replaced local trailingZeros in contours.cpp. Use alignSize in canny.cpp
* Fix warnings in alignSize and allocate just minimum extra columns.
* Fix another warning in map.create.
* Exchange for loop by do loop to avoid double check at the beginning.
Define extra SIMD CANNY_CHECK to avoid unnecessary continue.
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
Warping a matrix with more than 4 channels using BORDER_CONSTANT and
INTER_NEAREST, INTER_CUBIC or INTER_LANCZOS4 interpolation led to
undefined behaviour. This commit changes the behavior of these methods
to be similar to that of INTER_LINEAR. Changed the scope of some of the
variables to more local. Modified some tests to be able to detect the
error described.
added 64b optimization for 3 channels case
not added 64b optimization for 4 channels case since timings did not
show any improvement
split ICV_HLINE cases into inline functions instead of macro for code
size reduction, without significand speed drawback at first sight