The hard-coded string value "Mat" was used in the two format strings for vector_mat and vector_mat_template, preventing UMat arguments to functions that have these types from working correctly. as noted in #12231.
* Vectorize calculating integral for line for single and multiple channels
* Single vector processing for 4-channels - 25-30% faster
* Single vector processing for 4-channels - 25-30% faster
* Fixed AVX512 code for 4 channels
* Disable 3 channel 8UC1 to 32S for SSE2 and SSE3 (slower). Use new version of 8UC1 to 64F for AVX512.
fixed the ordering of contour convex hull points
* partially fixed the issue #4539
* fixed warnings and test failures
* fixed integer overflow (issue #14521)
* added comment to force buildbot to re-run
* extended the test for the issue 4539. Check the expected behaviour on the original contour as well
* added comment; fixed typo, renamed another variable for a little better clarity
* added yet another part to the test for issue #4539, where we run convexHull and convexityDetects on the original contour, without any manipulations. the rest of the test stays the same
* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)
* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code
* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)
* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
Changes:
* UMat for blur + rotate resulting in a speedup of around 2X on an i7
* support for boards larger than specified allowing to cover full FOV
* support for markers moving the origin into the center of the board
* increase detection accuracy
The main change is for supporting boards that are larger than the FOV of
the camera and have their origin in the board center. This allows
building OEM calibration targets similar to the one from intel real
sense utilizing corner points as close as possible to the image border.
cuda4dnn(concat): write outputs from previous layers directly into concat's output
* eliminate concat by directly writing to its output buffer
* fix concat fusion not happening sometimes
* use a whitelist instead of a blacklist