OpenCV pthreads-based implementation changes:
- rework worker threads pool, allow to execute job by the main thread too
- rework synchronization scheme (wait for job completion, threads 'pong' answer is not required)
- allow "active wait" (spin) by worker threads and by the main thread
- use _mm_pause() during active wait (support for Hyper-Threading technology)
- use sched_yield() to avoid preemption of still working other workers
- don't use getTickCount()
- optional builtin thread pool profiler (disabled by compilation flag)
UMatData locks are not mapped on real locks (they are mapped to some "pre-initialized" pool).
Concurrent execution of these statements may lead to deadlock:
- a.copyTo(b) from thread 1
- c.copyTo(d) from thread 2
where:
- 'a' and 'd' are mapped to single lock "A".
- 'b' and 'c' are mapped to single lock "B".
Workaround is to process locks with strict order.
The opencv infrastructure mostly has the basics for supporting avx512 math functions,
but it wasn't hooked up (likely due to lack of users)
In order to compile the DNN functions for AVX512, a few things need to be hooked up
and this patch does that
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
- don't store ProgramSource in compiled Programs (resolved problem with "source" buffers lifetime)
- completelly remove Program::read/write methods implementation:
- replaced with method to query RAW OpenCL binary without any "custom" data
- deprecate Program::getPrefix() methods
If there are no OpenCL/UMat methods calls from application.
OpenCL subsystem is initialized:
- haveOpenCL() is called from application
- useOpenCL() is called from application
- access to OpenCL allocator: UMat is created (empty UMat is ignored) or UMat <-> Mat conversions are called
Don't call OpenCL functions if OPENCV_OPENCL_RUNTIME=disabled
(independent from OpenCL linkage type)
* add accuracy test and performance check for matmul
* add performance tests for transform and dotProduct
* add test Core_TransformLargeTest for 8u version of transform
* remove raw SSE2/NEON implementation from matmul.cpp
* use universal intrinsic instead of raw intrinsic
* remove unused templated function
* add v_matmuladd which multiply 3x3 matrix and add 3x1 vector
* add v_rotate_left/right in universal intrinsic
* suppress intrinsic on some function and platform
* add pure SW implementation of new universal intrinsics
* add test for new universal intrinsics
* core: prevent memory access after the end of buffer
* fix perf tests
When elements are 64 bits, the vec_st_interleave()/vec_ld_deinterleave()
doesn't interleave 4 elements correctly.
For vec_st_interleave(), following is saved into mem:
a0 b0 a1 b1 c0 d0 c1 d1
-> we expected:
a0 b0 c0 d0 a1 b1 c1 d1
for vec_ld_deinterleave(), following is loaded into a b c d for memory
string { 1 2 3 4 5 6 7 8 }:
a: 1 3
b: 2 4
c: 5 7
d: 6 8
-> we expected:
a: 1 5
b: 2 6
c: 3 7
d: 4 8
This patch corrects this behavior.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
- changed behavior of vec_ctf, vec_ctu, vec_cts
in gcc and clang to make them compatible with XLC
- implemented most of missing conversion intrinsics in gcc and clang
- implemented conversions intrinsics of odd-numbered elements
- ignored gcc bug warning that caused by -Wunused-but-set-variable in rare cases
- replaced right shift with algebraic right shift for signed vectors
to shift in the sign bit.
- added new universal intrinsics v_matmuladd, v_rotate_left/right
- avoid using floating multiply-add in RNG
* Update OpenCVCompilerOptimizations.cmake
Neon not supported on MSVC ARM breaking build fix
* Update OpenCVCompilerOptimizations.cmake
Whitespace
* Update intrin.hpp
Many problems in MSVC ARM builds (at least on VS2017) being fixed in this PR now.
C:\Users\Gregory\DOCUME~1\MYLIBR~1\OPENCV~3\opencv\sources\modules\core\include\opencv2/core/hal/intrin.hpp(444): error C3861: '_tzcnt_u32': identifier not found
* Update hal_replacement.hpp
Passing variadic expansion in a macro to another macro does not work properly in MSVC and a famous known workaround is hereby applied. Discussion of it: https://stackoverflow.com/questions/5134523/msvc-doesnt-expand-va-args-correctly
Only needed the fix for ARM builds: TEGRA_ macros are used for cv_hal_ functions in the carotene library.
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): warning C4003: not enough actual parameters for macro 'TEGRA_ADD'
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): error C2143: syntax error: missing ')' before ','
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): error C2059: syntax error: ')'
* Update hal_replacement.hpp
All hal_replacement's using carotene\hal\tegra_hal.hpp TEGRA_ functions as macros preprocessed by variadic macros should be changed, identical as was done in core.
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\imgproc\src\color.cpp(9604): warning C4003: not enough actual parameters for macro 'TEGRA_CVTBGRTOBGR'
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\imgproc\src\color.cpp(9604): error C2059: syntax error: '=='
* Update OpenCVCompilerOptimizations.cmake
* Update hal_replacement.hpp
* Update hal_replacement.hpp