* Add a 512 bit codepath to the AVX512 fastConv function
this patch adds a 512 wide codepath to the fastConv() function for
AVX512 use.
The basic idea is to process the first N * 16 elements of the vector
with avx512, and then run the rest of the vector using the traditional
AVX2 codepath.
* dnn: use unaligned AVX512 load (OpenCV aligns data on 32-byte boundary)
* dnn: change "vecsize" condition for AVX512
* dnn: fix indentation
OpenCV pthreads-based implementation changes:
- rework worker threads pool, allow to execute job by the main thread too
- rework synchronization scheme (wait for job completion, threads 'pong' answer is not required)
- allow "active wait" (spin) by worker threads and by the main thread
- use _mm_pause() during active wait (support for Hyper-Threading technology)
- use sched_yield() to avoid preemption of still working other workers
- don't use getTickCount()
- optional builtin thread pool profiler (disabled by compilation flag)
allows Stitcher to be used for scans from within python.
I had to use very strange notation because I couldn't export the `enum`
`Mode` making the Cpython generated code unable to compile.
```c++
class Stitcher {
public:
enum Mode
{
PANORAMA = 0,
SCANS = 1,
};
...
```
Also removed duplicate code from the `createStitcher` function making
use of the `Stitcher::create` function
* Bit-exact implementation of GaussianBlur smoothing
* Added universal intrinsics based implementation for bit-exact CV_8U GaussianBlur smoothing.
* Added parallel_for to evaluation of bit-exact GaussianBlur
* Added custom implementations for 3x3 and 5x5 bit-exact GaussianBlur
* cuda::cvtColor bug fix
Fixed bug in conversion formula between RGB space and LUV space.
Testing with opencv_test_cudaimgproc.exe, this commit reduces the number
of failed tests from 191 to 95. (96 more tests pass)
* Rename variables
Documentation generation refactoring (#10621)
* Documentation build updates:
- disable documentation by default, do not add to ALL target
- combine Doxygen and Javadoc
- optimize Doxygen html
* javadoc: fix path in build directory
* cmake: fix "Documentation" status line
* Newton's method can be more efficient
when we get the result of function distortPoint with a point (0, 0) and then undistortPoint with the result, we get the point not (0, 0). and then we discovered that the old method is not convergence sometimes. finally we have gotten the right values by Newton's method.
* modify by advice Newton's method...#10574
* calib3d(fisheye): fix codestyle, update theta before exit EPS check
UMatData locks are not mapped on real locks (they are mapped to some "pre-initialized" pool).
Concurrent execution of these statements may lead to deadlock:
- a.copyTo(b) from thread 1
- c.copyTo(d) from thread 2
where:
- 'a' and 'd' are mapped to single lock "A".
- 'b' and 'c' are mapped to single lock "B".
Workaround is to process locks with strict order.