* remove raw SSE2/NEON implementation from convert.cpp
* remove raw implementation from Cvt_SIMD
* remove raw implementation from cvtScale_SIMD
* remove raw implementation from cvtScaleAbs_SIMD
* remove duplicated implementation cvt_<float, short>
* remove duplicated implementation cvtScale_<short, short, float>
* add "from double" version of Cvt_SIMD
* modify the condition of test ConvertScaleAbs
* Update convert.cpp
fixed crash in cvtScaleAbs(8s=>8u)
* fixed compile error on Win32
* fixed several test failures because of accuracy loss in cvtScale(int=>int)
* fixed NEON implementation of v_cvt_f64(int=>double) intrinsic
* another attempt to fix test failures
* keep trying to fix the test failures and just introduced compile warnings
* fixed one remaining test (subtractScalar)
- don't store ProgramSource in compiled Programs (resolved problem with "source" buffers lifetime)
- completelly remove Program::read/write methods implementation:
- replaced with method to query RAW OpenCL binary without any "custom" data
- deprecate Program::getPrefix() methods
* fixed OpenCL functions on Mac, so that the tests pass
* fixed compile warnings; temporarily disabled OCL branch of TV L1 optical flow on mac
* fixed other few warnings on macos
If there are no OpenCL/UMat methods calls from application.
OpenCL subsystem is initialized:
- haveOpenCL() is called from application
- useOpenCL() is called from application
- access to OpenCL allocator: UMat is created (empty UMat is ignored) or UMat <-> Mat conversions are called
Don't call OpenCL functions if OPENCV_OPENCL_RUNTIME=disabled
(independent from OpenCL linkage type)
* add accuracy test and performance check for matmul
* add performance tests for transform and dotProduct
* add test Core_TransformLargeTest for 8u version of transform
* remove raw SSE2/NEON implementation from matmul.cpp
* use universal intrinsic instead of raw intrinsic
* remove unused templated function
* add v_matmuladd which multiply 3x3 matrix and add 3x1 vector
* add v_rotate_left/right in universal intrinsic
* suppress intrinsic on some function and platform
* add pure SW implementation of new universal intrinsics
* add test for new universal intrinsics
* core: prevent memory access after the end of buffer
* fix perf tests
When elements are 64 bits, the vec_st_interleave()/vec_ld_deinterleave()
doesn't interleave 4 elements correctly.
For vec_st_interleave(), following is saved into mem:
a0 b0 a1 b1 c0 d0 c1 d1
-> we expected:
a0 b0 c0 d0 a1 b1 c1 d1
for vec_ld_deinterleave(), following is loaded into a b c d for memory
string { 1 2 3 4 5 6 7 8 }:
a: 1 3
b: 2 4
c: 5 7
d: 6 8
-> we expected:
a: 1 5
b: 2 6
c: 3 7
d: 4 8
This patch corrects this behavior.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>