Add python bindings to cudaobjdetect, cudawarping and cudaarithm
* Overload cudawarping functions to generate correct python bindings.
Add python wrapper to convolution funciton.
* Added shift and hog.
* Moved cuda python tests to this repo and added python bindings to SURF.
* Fix SURF documentation and allow meanshiftsegmention to create GpuMat internaly if not passed for python bindings consistency.
* Add correct cuda SURF test case.
* Fix python mog and mog2 python bindings, add tests and correct cudawarping documentation.
* Updated KeyPoints in cuda::ORB::Convert python wrapper to be an output argument.
* Add changes suggested by alalek
* Added changes suggested by asmorkalov
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
original commit: fb8e652c3f
* __shfl_up_sync with proper mask value for CUDA >= 9
* BlockScanInclusive for CUDA >= 9
* compatible_shfl_up for use in integral.hpp
* Use CLAHE in cudev
* Add tests for BlockScan
original commit: 970293a229