Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
* __shfl_up_sync with proper mask value for CUDA >= 9
* BlockScanInclusive for CUDA >= 9
* compatible_shfl_up for use in integral.hpp
* Use CLAHE in cudev
* Add tests for BlockScan
cuda_canny : multi stream safety (#11483)
* CUDA_ImgProc/Canny Asynchronous test
* cuda_canny : multi stream safety (1/3)
- Convert global variable canny::counter to class local variable
* cuda_canny : multi stream safety (2/3)
- Use texture objects rather than texture reference for cc >= 3.0,
since texture reference must be declared as a static global variable
which results in race condition when ran concurrently
* cuda_canny : multi stream safety (3/3)
- Refrain from using global variable in row_filter and column_filter
(converts column_filter::c_kernel and row_filter::c_kernel to local
variables)
* Fixes#11193
Implement cv::cuda::calcHist with mask support (#8367)
* Implement cuda::calcHist with mask
* Fix documentation build warning
* Have their own step sizes for src and mask. Fix review comment.
Without limits included, several CUDA related files fail to compile with
GCC on Ubuntu:
modules/cudaimgproc/src/hough_lines.cpp:136:9: error: ‘numeric_limits’ is not a member of ‘std’