When accessing global memory by DWORD4, memory bandwidth
can be fully utilized on Intel platform. This patch will
make more image format(e.g. 8UC4) be processed in DWORD4
by work-item. After applying this patch, 3 subcase of
./opencv_perf_core --gtest_filter=OCL_RepeatFixture_Repeat.Repeat/*
can be speedup on HD4000 graphics card with Beignet:
OCL_RepeatFixture_Repeat.Repeat/2, 64% improvement.
OCL_RepeatFixture_Repeat.Repeat/6, 50% improvement.
OCL_RepeatFixture_Repeat.Repeat/8, 56% improvement.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Added a method "loadFromString" which is based on the "load" one. It
allow to directly pass the XML string which can be usefull and faster
when you have a huge file in a variable.
Rewrite the note on HoughCircles documentation to make it more clear
Add note to clarify that the output vector of found circles is sorted by
descending order of centres accumulator values.
Also delete reductant lines on the HoughCircles documentation.
Added comments to hough circles function.
Added comments to icvhoughgradient
Misalignment in line 1183 corrected
In the unoptimized version of small symmetrical column filters, when we
try to detect if the kernel, ky, is equal to [1;2;1] or [1;-2;1] we
should take into consideration that the anchor points in the middle
element.
The function parameters were different from the ones described below.
P.S. Why is ``flow`` InputOutputArray, shouldn't it be just OutputArray? If so, shouldn't the reason be specified - e.g. so others can benefit as well (e.g. not allocating memory on every frame?)
1. Remove unnecessary barriers.
2. Adjust CTA_SIZE based on the following cases for Intel platform:
a) OCL_Photo_DenoisingGrayscale.DenoisingGrayscale
b) OCL_Photo_DenoisingColored.DenoisingColored
Not all parameters are specified for openni::VideoMode, so
"selected" mode can be unsupported for device.
Replace default VideoMode constructor to result of getVideoMode() call.