Add ocl version BackgroundSubtractorKNN (#10553)
* Add ocl version bgfg_knn
* Add ocl KNN perf test
* ocl KNN: Avoid unnecessary initializing when non-UMat parameters are used
* video: turn off OpenCL for color KNN on Intel devices
due performance degradation
* video: turn off KNN OpenCL on Apple devices with Intel iGPU
due process freeze during clBuildProgram() call
* Do not build protobuf if dnn is disabled
* Added BUILD_LIST cmake option to the cache
* Moved protobuf to the top level
* Fixed static build
* Fixed world build
* fixup! Fixed world build
* Add a 512 bit codepath to the AVX512 fastConv function
this patch adds a 512 wide codepath to the fastConv() function for
AVX512 use.
The basic idea is to process the first N * 16 elements of the vector
with avx512, and then run the rest of the vector using the traditional
AVX2 codepath.
* dnn: use unaligned AVX512 load (OpenCV aligns data on 32-byte boundary)
* dnn: change "vecsize" condition for AVX512
* dnn: fix indentation
OpenCV pthreads-based implementation changes:
- rework worker threads pool, allow to execute job by the main thread too
- rework synchronization scheme (wait for job completion, threads 'pong' answer is not required)
- allow "active wait" (spin) by worker threads and by the main thread
- use _mm_pause() during active wait (support for Hyper-Threading technology)
- use sched_yield() to avoid preemption of still working other workers
- don't use getTickCount()
- optional builtin thread pool profiler (disabled by compilation flag)
allows Stitcher to be used for scans from within python.
I had to use very strange notation because I couldn't export the `enum`
`Mode` making the Cpython generated code unable to compile.
```c++
class Stitcher {
public:
enum Mode
{
PANORAMA = 0,
SCANS = 1,
};
...
```
Also removed duplicate code from the `createStitcher` function making
use of the `Stitcher::create` function
* Bit-exact implementation of GaussianBlur smoothing
* Added universal intrinsics based implementation for bit-exact CV_8U GaussianBlur smoothing.
* Added parallel_for to evaluation of bit-exact GaussianBlur
* Added custom implementations for 3x3 and 5x5 bit-exact GaussianBlur