opencv

Author	SHA1	Message	Date
Zihao Mu	7b582b71ba	Merge pull request #21036 An error occurred from fengyuentau:timvx_backend_support dnn: TIM-VX NPU backend support * Add TimVX NPU backend for DNN module. * use official branch from tim-vx repo; fix detecting viv sdk Co-authored-by: fytao <yuantao.feng@outlook.com>	3 years ago
Alexander Alekhin	effce0573b	dnn: drop legacy Inference Engine NN builder API	3 years ago
Hanxi Guo	1fcf7ba5bc	Merge pull request #20406 An error occurred from MarkGHX:gsoc_2021_webnn [GSoC] OpenCV.js: Accelerate OpenCV.js DNN via WebNN * Add WebNN backend for OpenCV DNN Module Update dnn.cpp Update dnn.cpp Update dnn.cpp Update dnn.cpp Add WebNN head files into OpenCV 3rd partiy files Create webnn.hpp update cmake Complete README and add OpenCVDetectWebNN.cmake file add webnn.cpp Modify webnn.cpp Can successfully compile the codes for creating a MLContext Update webnn.cpp Update README.md Update README.md Update README.md Update README.md Update cmake files and update README.md Update OpenCVDetectWebNN.cmake and README.md Update OpenCVDetectWebNN.cmake Fix OpenCVDetectWebNN.cmake and update README.md Add source webnn_cpp.cpp and libary libwebnn_proc.so Update dnn.cpp Update dnn.cpp Update dnn.cpp Update dnn.cpp update dnn.cpp update op_webnn update op_webnn Update op_webnn.hpp update op_webnn.cpp & hpp Update op_webnn.hpp Update op_webnn update the skeleton Update op_webnn.cpp Update op_webnn Update op_webnn.cpp Update op_webnn.cpp Update op_webnn.hpp update op_webnn update op_webnn Solved the problems of released variables. Fixed the bugs in op_webnn.cpp Implement op_webnn Implement Relu by WebNN API Update dnn.cpp for better test Update elementwise_layers.cpp Implement ReLU6 Update elementwise_layers.cpp Implement SoftMax using WebNN API Implement Reshape by WebNN API Implement PermuteLayer by WebNN API Implement PoolingLayer using WebNN API Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Implement poolingLayer by WebNN API and add more detailed logs Update dnn.cpp Update dnn.cpp Remove redundant codes and add more logs for poolingLayer Add more logs in the pooling layer implementation Fix the indent issue and resolve the compiling issue Fix the build problems Fix the build issue FIx the build issue Update dnn.cpp Update dnn.cpp * Fix the build issue * Implement BatchNorm Layer by WebNN API * Update convolution_layer.cpp This is a temporary file for Conv2d layer implementation * Integrate some general functions into op_webnn.cpp&hpp * Update const_layer.cpp * Update convolution_layer.cpp Still have some bugs that should be fixed. * Update conv2d layer and fc layer still have some problems to be fixed. * update constLayer, conv layer, fc layer There are still some bugs to be fixed. * Fix the build issue * Update concat_layer.cpp Still have some bugs to be fixed. * Update conv2d layer, fully connected layer and const layer * Update convolution_layer.cpp * Add OpenCV.js DNN module WebNN Backend (both using webnn-polyfill and electron) * Delete bib19450.aux * Add WebNN backend for OpenCV DNN Module Update dnn.cpp Update dnn.cpp Update dnn.cpp Update dnn.cpp Add WebNN head files into OpenCV 3rd partiy files Create webnn.hpp update cmake Complete README and add OpenCVDetectWebNN.cmake file add webnn.cpp Modify webnn.cpp Can successfully compile the codes for creating a MLContext Update webnn.cpp Update README.md Update README.md Update README.md Update README.md Update cmake files and update README.md Update OpenCVDetectWebNN.cmake and README.md Update OpenCVDetectWebNN.cmake Fix OpenCVDetectWebNN.cmake and update README.md Add source webnn_cpp.cpp and libary libwebnn_proc.so Update dnn.cpp Update dnn.cpp Update dnn.cpp Update dnn.cpp update dnn.cpp update op_webnn update op_webnn Update op_webnn.hpp update op_webnn.cpp & hpp Update op_webnn.hpp Update op_webnn update the skeleton Update op_webnn.cpp Update op_webnn Update op_webnn.cpp Update op_webnn.cpp Update op_webnn.hpp update op_webnn update op_webnn Solved the problems of released variables. Fixed the bugs in op_webnn.cpp Implement op_webnn Implement Relu by WebNN API Update dnn.cpp for better test Update elementwise_layers.cpp Implement ReLU6 Update elementwise_layers.cpp Implement SoftMax using WebNN API Implement Reshape by WebNN API Implement PermuteLayer by WebNN API Implement PoolingLayer using WebNN API Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Update pooling_layer.cpp Implement poolingLayer by WebNN API and add more detailed logs Update dnn.cpp Update dnn.cpp Remove redundant codes and add more logs for poolingLayer Add more logs in the pooling layer implementation Fix the indent issue and resolve the compiling issue Fix the build problems Fix the build issue FIx the build issue Update dnn.cpp Update dnn.cpp * Fix the build issue * Implement BatchNorm Layer by WebNN API * Update convolution_layer.cpp This is a temporary file for Conv2d layer implementation * Integrate some general functions into op_webnn.cpp&hpp * Update const_layer.cpp * Update convolution_layer.cpp Still have some bugs that should be fixed. * Update conv2d layer and fc layer still have some problems to be fixed. * update constLayer, conv layer, fc layer There are still some bugs to be fixed. * Update conv2d layer, fully connected layer and const layer * Update convolution_layer.cpp * Add OpenCV.js DNN module WebNN Backend (both using webnn-polyfill and electron) * Update dnn.cpp * Fix Error in dnn.cpp * Resolve duplication in conditions in convolution_layer.cpp * Fixed the issues in the comments * Fix building issue * Update tutorial * Fixed comments * Address the comments * Update CMakeLists.txt * Offer more accurate perf test on native * Add better perf tests for both native and web * Modify per tests for better results * Use more latest version of Electron * Support latest WebNN Clamp op * Add definition of HAVE_WEBNN macro * Support group convolution * Implement Scale_layer using WebNN * Add Softmax option for native classification example * Fix comments * Fix comments	3 years ago
SamFC10	fa90e14b06	int8 layers and 8-bit quantization support	4 years ago
Alexander Alekhin	83aa711346	dnn: rename clamp() => normalize_axis()	4 years ago
Dmitry Kurtaev	5e5385a814	Remove NodeVector for nGraph	5 years ago
Alexander Alekhin	124bf8339f	dnn(IE): use HAVE_DNN_IE_NN_BUILDER_2019 for NN Builder API code - CMake option: OPENCV_DNN_IE_NN_BUILDER_2019	5 years ago
Alexander Alekhin	29d214474f	dnn(IE): use HAVE_DNN_IE_NN_BUILDER_2019 for NN Builder API code - CMake option: OPENCV_DNN_IE_NN_BUILDER_2019	5 years ago
Dmitry Kurtaev	8f1e36f7c1	Disable some tests for Myriad target of nGraph Add lightweight IE hardware targets checks nGraph: Concat with paddings Enable more nGraph tests Restore FP32->FP16 for GPU plugin of IE try to fix buildbot Use lightweight IE targets check only starts from R4	5 years ago
Lubov Batanina	7523c777c5	Merge pull request #15537 An error occurred from l-bat:ngraph * Support nGraph * Fix resize	5 years ago
Yashas Samaga B L	613c12e590	Merge pull request #14827 An error occurred from YashasSamaga:cuda4dnn-csl-low CUDA backend for the DNN module * stub cuda4dnn design * minor fixes for tests and doxygen * add csl public api directory to module headers * add low-level CSL components * add high-level CSL components * integrate csl::Tensor into backbone code * switch to CPU iff unsupported; otherwise, fail on error * add fully connected layer * add softmax layer * add activation layers * support arbitary rank TensorDescriptor * pass input wrappers to `initCUDA()` * add 1d/2d/3d-convolution * add pooling layer * reorganize and refactor code * fixes for gcc, clang and doxygen; remove cxx14/17 code * add blank_layer * add LRN layer * add rounding modes for pooling layer * split tensor.hpp into tensor.hpp and tensor_ops.hpp * add concat layer * add scale layer * add batch normalization layer * split math.cu into activations.cu and math.hpp * add eltwise layer * add flatten layer * add tensor transform api * add asymmetric padding support for convolution layer * add reshape layer * fix rebase issues * add permute layer * add padding support for concat layer * refactor and reorganize code * add normalize layer * optimize bias addition in scale layer * add prior box layer * fix and optimize normalize layer * add asymmetric padding support for pooling layer * add event API * improve pooling performance for some padding scenarios * avoid over-allocation of compute resources to kernels * improve prior box performance * enable layer fusion * add const layer * add resize layer * add slice layer * add padding layer * add deconvolution layer * fix channelwise ReLU initialization * add vector traits * add vectorized versions of relu, clipped_relu, power * add vectorized concat kernels * improve concat_with_offsets performance * vectorize scale and bias kernels * add support for multi-billion element tensors * vectorize prior box kernels * fix address alignment check * improve bias addition performance of conv/deconv/fc layers * restructure code for supporting multiple targets * add DNN_TARGET_CUDA_FP64 * add DNN_TARGET_FP16 * improve vectorization * add region layer * improve tensor API, add dynamic ranks 1. use ManagedPtr instead of a Tensor in backend wrapper 2. add new methods to tensor classes - size_range: computes the combined size of for a given axis range - tensor span/view can be constructed from a raw pointer and shape 3. the tensor classes can change their rank at runtime (previously rank was fixed at compile-time) 4. remove device code from tensor classes (as they are unused) 5. enforce strict conditions on tensor class APIs to improve debugging ability * fix parametric relu activation * add squeeze/unsqueeze tensor API * add reorg layer * optimize permute and enable 2d permute * enable 1d and 2d slice * add split layer * add shuffle channel layer * allow tensors of different ranks in reshape primitive * patch SliceOp to allow Crop Layer * allow extra shape inputs in reshape layer * use `std::move_backward` instead of `std::move` for insert in resizable_static_array * improve workspace management * add spatial LRN * add nms (cpu) to region layer * add max pooling with argmax ( and a fix to limits.hpp) * add max unpooling layer * rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA * update supportBackend to be more rigorous * remove stray include from preventing non-cuda build * include op_cuda.hpp outside condition #if * refactoring, fixes and many optimizations * drop DNN_TARGET_CUDA_FP64 * fix gcc errors * increase max. tensor rank limit to six * add Interp layer * drop custom layers; use BackendNode * vectorize activation kernels * fixes for gcc * remove wrong assertion * fix broken assertion in unpooling primitive * fix build errors in non-CUDA build * completely remove workspace from public API * fix permute layer * enable accuracy and perf. tests for DNN_TARGET_CUDA * add asynchronous forward * vectorize eltwise ops * vectorize fill kernel * fixes for gcc * remove CSL headers from public API * remove csl header source group from cmake * update min. cudnn version in cmake * add numerically stable FP32 log1pexp * refactor code * add FP16 specialization to cudnn based tensor addition * vectorize scale1 and bias1 + minor refactoring * fix doxygen build * fix invalid alignment assertion * clear backend wrappers before allocateLayers * ignore memory lock failures * do not allocate internal blobs * integrate NVTX * add numerically stable half precision log1pexp * fix indentation, following coding style, improve docs * remove accidental modification of IE code * Revert "add asynchronous forward" This reverts commit 1154b9da9da07e9b52f8a81bdcea48cf31c56f70. * [cmake] throw error for unsupported CC versions * fix rebase issues * add more docs, refactor code, fix bugs * minor refactoring and fixes * resolve warnings/errors from clang * remove haveCUDA() checks from supportBackend() * remove NVTX integration * changes based on review comments * avoid exception when no CUDA device is present * add color code for CUDA in Net::dump	6 years ago
Lubov Batanina	0e1ef8f8e1	Merge pull request #15184 An error occurred from l-bat:IE_R2 Support new IE API (#15184 An error occurred ) * Add support OpenVINO R2 for layers * Add Core API * Fix tests * Fix expectNoFallbacksFromIE for ONNX nets * Remove deprecated API * Remove td * Remove TargetDevice * Fix Async * Add test * Fix detectMyriadX * Fix test * Fix warning	6 years ago
Dmitry Kurtaev	eba696a41e	Merge pull request #14792 An error occurred from dkurt:dnn_ie_min_version_r5 * Remove Inference Engine 2018R3 and 2018R4 * Fix 2018R5	6 years ago
Dmitry Kurtaev	f0ddf302b2	Move Inference Engine to new API	6 years ago
Alexander Alekhin	96c71dd3d2	dnn: reduce set of ignored warnings	6 years ago
WuZhiwen	6e3ea8b49d	Merge pull request #12703 An error occurred from wzw-intel:vkcom * dnn: Add a Vulkan based backend This commit adds a new backend "DNN_BACKEND_VKCOM" and a new target "DNN_TARGET_VULKAN". VKCOM means vulkan based computation library. This backend uses Vulkan API and SPIR-V shaders to do the inference computation for layers. The layer types that implemented in DNN_BACKEND_VKCOM include: Conv, Concat, ReLU, LRN, PriorBox, Softmax, MaxPooling, AvePooling, Permute This is just a beginning work for Vulkan in OpenCV DNN, more layer types will be supported and performance tuning is on the way. Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> * dnn/vulkan: Add FindVulkan.cmake to detect Vulkan SDK In order to build dnn with Vulkan support, need installing Vulkan SDK and setting environment variable "VULKAN_SDK" and add "-DWITH_VULKAN=ON" to cmake command. You can download Vulkan SDK from: https://vulkan.lunarg.com/sdk/home#linux For how to install, see https://vulkan.lunarg.com/doc/sdk/latest/linux/getting_started.html https://vulkan.lunarg.com/doc/sdk/latest/windows/getting_started.html https://vulkan.lunarg.com/doc/sdk/latest/mac/getting_started.html respectively for linux, windows and mac. To run the vulkan backend, also need installing mesa driver. On Ubuntu, use this command 'sudo apt-get install mesa-vulkan-drivers' To test, use command '$BUILD_DIR/bin/opencv_test_dnn --gtest_filter=VkCom' Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> * dnn/Vulkan: dynamically load Vulkan runtime No compile-time dependency on Vulkan library. If Vulkan runtime is unavailable, fallback to CPU path. Use environment "OPENCL_VULKAN_RUNTIME" to specify path to your own vulkan runtime library. Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> * dnn/Vulkan: Add a python script to compile GLSL shaders to SPIR-V shaders The SPIR-V shaders are in format of text-based 32-bit hexadecimal numbers, and inserted into .cpp files as unsigned int32 array. * dnn/Vulkan: Put Vulkan headers into 3rdparty directory and some other fixes Vulkan header files are copied from https://github.com/KhronosGroup/Vulkan-Docs/tree/master/include/vulkan to 3rdparty/include Fix the Copyright declaration issue. Refine OpenCVDetectVulkan.cmake * dnn/Vulkan: Add vulkan backend tests into existing ones. Also fixed some test failures. - Don't use bool variable as uniform for shader - Fix dispathed group number beyond max issue - Bypass "group > 1" convolution. This should be support in future. * dnn/Vulkan: Fix multiple initialization in one thread.	7 years ago
Dmitry Kurtaev	24ab751547	Merge pull request #12565 An error occurred from dkurt:dnn_non_intel_gpu * Remove isIntel check from deep learning layers * Remove fp16->fp32 fallbacks where it's not necessary * Fix Kernel::run to prevent localsize > globalsize	7 years ago
Dmitry Kurtaev	d486204a0d	Merge pull request #12264 An error occurred from dkurt:dnn_remove_forward_method * Remove a forward method in dnn::Layer * Add a test * Fix tests * Mark multiple dnn::Layer::finalize methods as deprecated * Replace back dnn's inputBlobs to vector of pointers * Remove Layer::forward_fallback from CV_OCL_RUN scopes	7 years ago
Dmitry Kurtaev	b781ac7346	Make Intel's Inference Engine backend is default if no preferable backend is specified.	7 years ago
Li Peng	3dd916882a	fp16 ocl support for googlenet Signed-off-by: Li Peng <peng.li@intel.com>	7 years ago
Alexander Alekhin	1060c0f439	dnn: apply CV_OVERRIDE/CV_FINAL	7 years ago
Alexander Alekhin	6c051a55e5	cmake: don't add include <module>/src directory to avoid conflicts during opencv_world builds	7 years ago
Alexander Alekhin	4a6d582f2e	dnn: make OpenCL DNN code optional	7 years ago
Alexander Alekhin	1b83bc48a1	dnn: make OpenCL DNN code optional	7 years ago
luz.paz	5718d09e39	Misc. modules/ typos Found via `codespell`	7 years ago
Dmitry Kurtaev	7fe97376c2	MobileNet-SSD from TensorFlow 1.3 and Inception-V2-SSD using Inference Engine backend	7 years ago
Dmitry Kurtaev	10e1de74d2	Intel Inference Engine deep learning backend (#10608 An error occurred ) * Intel Inference Engine deep learning backend. * OpenFace network using Inference Engine backend	7 years ago
Li Peng	85b1c4060c	support axis in concat layer ocl path Signed-off-by: Li Peng <peng.li@intel.com>	7 years ago
Li Peng	8f99083726	Add new layer forward interface Add layer forward interface with InputArrayOfArrays and OutputArrayOfArrays parameters, it allows UMat buffer to be processed and transferred in the layers. Signed-off-by: Li Peng <peng.li@intel.com>	7 years ago
pengli	e340ff9c3a	Merge pull request #9114 An error occurred from pengli:dnn_rebase add libdnn acceleration to dnn module (#9114 An error occurred ) * import libdnn code Signed-off-by: Li Peng <peng.li@intel.com> * add convolution layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add pooling layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add softmax layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add lrn layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add innerproduct layer ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * add HAVE_OPENCL macro Signed-off-by: Li Peng <peng.li@intel.com> * fix for convolution ocl Signed-off-by: Li Peng <peng.li@intel.com> * enable getUMat() for multi-dimension Mat Signed-off-by: Li Peng <peng.li@intel.com> * use getUMat for ocl acceleration Signed-off-by: Li Peng <peng.li@intel.com> * use CV_OCL_RUN macro Signed-off-by: Li Peng <peng.li@intel.com> * set OPENCL target when it is available and disable fuseLayer for OCL target for the time being Signed-off-by: Li Peng <peng.li@intel.com> * fix innerproduct accuracy test Signed-off-by: Li Peng <peng.li@intel.com> * remove trailing space Signed-off-by: Li Peng <peng.li@intel.com> * Fixed tensorflow demo bug. Root cause is that tensorflow has different algorithm with libdnn to calculate convolution output dimension. libdnn don't calculate output dimension anymore and just use one passed in by config. * split gemm ocl file split it into gemm_buffer.cl and gemm_image.cl Signed-off-by: Li Peng <peng.li@intel.com> * Fix compile failure Signed-off-by: Li Peng <peng.li@intel.com> * check env flag for auto tuning Signed-off-by: Li Peng <peng.li@intel.com> * switch to new ocl kernels for softmax layer Signed-off-by: Li Peng <peng.li@intel.com> * update softmax layer on some platform subgroup extension may not work well, fallback to non subgroup ocl acceleration. Signed-off-by: Li Peng <peng.li@intel.com> * fallback to cpu path for fc layer with multi output Signed-off-by: Li Peng <peng.li@intel.com> * update output message Signed-off-by: Li Peng <peng.li@intel.com> * update fully connected layer fallback to gemm API if libdnn return false Signed-off-by: Li Peng <peng.li@intel.com> * Add ReLU OCL implementation * disable layer fusion for now Signed-off-by: Li Peng <peng.li@intel.com> * Add OCL implementation for concat layer Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> * libdnn: update license and copyrights Also refine libdnn coding style Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com> Signed-off-by: Li Peng <peng.li@intel.com> * DNN: Don't link OpenCL library explicitly * DNN: Make default preferableTarget to DNN_TARGET_CPU User should set it to DNN_TARGET_OPENCL explicitly if want to use OpenCL acceleration. Also don't fusion when using DNN_TARGET_OPENCL * DNN: refine coding style * Add getOpenCLErrorString * DNN: Use int32_t/uint32_t instread of alias * Use namespace ocl4dnn to include libdnn things * remove extra copyTo in softmax ocl path Signed-off-by: Li Peng <peng.li@intel.com> * update ReLU layer ocl path Signed-off-by: Li Peng <peng.li@intel.com> * Add prefer target property for layer class It is used to indicate the target for layer forwarding, either the default CPU target or OCL target. Signed-off-by: Li Peng <peng.li@intel.com> * Add cl_event based timer for cv::ocl * Rename libdnn to ocl4dnn Signed-off-by: Li Peng <peng.li@intel.com> Signed-off-by: wzw <zhiwen.wu@intel.com> * use UMat for ocl4dnn internal buffer Remove allocateMemory which use clCreateBuffer directly Signed-off-by: Li Peng <peng.li@intel.com> Signed-off-by: wzw <zhiwen.wu@intel.com> * enable buffer gemm in ocl4dnn innerproduct Signed-off-by: Li Peng <peng.li@intel.com> * replace int_tp globally for ocl4dnn kernels. Signed-off-by: wzw <zhiwen.wu@intel.com> Signed-off-by: Li Peng <peng.li@intel.com> * create UMat for layer params Signed-off-by: Li Peng <peng.li@intel.com> * update sign ocl kernel Signed-off-by: Li Peng <peng.li@intel.com> * update image based gemm of inner product layer Signed-off-by: Li Peng <peng.li@intel.com> * remove buffer gemm of inner product layer call cv::gemm API instead Signed-off-by: Li Peng <peng.li@intel.com> * change ocl4dnn forward parameter to UMat Signed-off-by: Li Peng <peng.li@intel.com> * Refine auto-tuning mechanism. - Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory for fine-tuned kernel configuration. e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp, the cache directory will be /home/tmp/spatialkernels/ on Linux. - Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable auto-tuning. - OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling for OpenCL command queue. This fix basic kernel get wrong running time, i.e. 0ms. - If creating cache directory failed, disable auto-tuning. * Detect and create cache dir on windows Signed-off-by: Li Peng <peng.li@intel.com> * Refine gemm like convolution kernel. Signed-off-by: Li Peng <peng.li@intel.com> * Fix redundant swizzleWeights calling when use cached kernel config. * Fix "out of resource" bug when auto-tuning too many kernels. * replace cl_mem with UMat in ocl4dnnConvSpatial class * OCL4DNN: reduce the tuning kernel candidate. This patch could reduce 75% of the tuning candidates with less than 2% performance impact for the final result. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> * replace cl_mem with umat in ocl4dnn convolution Signed-off-by: Li Peng <peng.li@intel.com> * remove weight_image_ of ocl4dnn inner product Actually it is unused in the computation Signed-off-by: Li Peng <peng.li@intel.com> * Various fixes for ocl4dnn 1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel()) 2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp 3. Code comments cleanup 4. ignore check on OCL cpu device Signed-off-by: Li Peng <peng.li@intel.com> * add build option for log softmax Signed-off-by: Li Peng <peng.li@intel.com> * remove unused ocl kernels in ocl4dnn Signed-off-by: Li Peng <peng.li@intel.com> * replace ocl4dnnSet with opencv setTo Signed-off-by: Li Peng <peng.li@intel.com> * replace ALIGN with cv::alignSize Signed-off-by: Li Peng <peng.li@intel.com> * check kernel build options Signed-off-by: Li Peng <peng.li@intel.com> * Handle program compilation fail properly. * Use std::numeric_limits<float>::infinity() for large float number * check ocl4dnn kernel compilation result Signed-off-by: Li Peng <peng.li@intel.com> * remove unused ctx_id Signed-off-by: Li Peng <peng.li@intel.com> * change clEnqueueNDRangeKernel to kernel.run() Signed-off-by: Li Peng <peng.li@intel.com> * change cl_mem to UMat in image based gemm Signed-off-by: Li Peng <peng.li@intel.com> * check intel subgroup support for lrn and pooling layer Signed-off-by: Li Peng <peng.li@intel.com> * Fix convolution bug if group is greater than 1 Signed-off-by: Li Peng <peng.li@intel.com> * Set default layer preferableTarget to be DNN_TARGET_CPU Signed-off-by: Li Peng <peng.li@intel.com> * Add ocl perf test for convolution Signed-off-by: Li Peng <peng.li@intel.com> * Add more ocl accuracy test Signed-off-by: Li Peng <peng.li@intel.com> * replace cl_image with ocl::Image2D Signed-off-by: Li Peng <peng.li@intel.com> * Fix build failure in elementwise layer Signed-off-by: Li Peng <peng.li@intel.com> * use getUMat() to get blob data Signed-off-by: Li Peng <peng.li@intel.com> * replace cl_mem handle with ocl::KernelArg Signed-off-by: Li Peng <peng.li@intel.com> * dnn(build): don't use C++11, OPENCL_LIBRARIES fix * dnn(ocl4dnn): remove unused OpenCL kernels * dnn(ocl4dnn): extract OpenCL code into .cl files * dnn(ocl4dnn): refine auto-tuning Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING environment variable to enable it. Use a set of pre-tuned configs as default config if auto-tuning is disabled. These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet, AlexNet, ResNet-50 If default config is not suitable, use the first available kernel config from the candidates. Candidate priority from high to low is gemm like kernel, IDLF kernel, basick kernel. * dnn(ocl4dnn): pooling doesn't use OpenCL subgroups * dnn(ocl4dnn): fix perf test OpenCV has default 3sec time limit for each performance test. Warmup OpenCL backend outside of perf measurement loop. * use ocl::KernelArg as much as possible Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): fix bias bug for gemm like kernel * dnn(ocl4dnn): wrap cl_mem into UMat Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): Refine signature of kernel config - Use more readable string as signture of kernel config - Don't count device name and vendor in signature string - Default kernel configurations are tuned for Intel GPU with 24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model. * dnn(ocl4dnn): swap width/height in configuration * dnn(ocl4dnn): enable configs for Intel OpenCL runtime only * core: make configuration helper functions accessible from non-core modules * dnn(ocl4dnn): update kernel auto-tuning behavior Avoid unwanted creation of directories * dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash * dnn(ocl4dnn): remove redundant code * dnn(ocl4dnn): Add more clear message for simd size dismatch. * dnn(ocl4dnn): add const to const argument Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel * dnn(ocl4dnn): drop unused tuneLocalSize() * dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method * dnn(ocl4dnn): sanitize file names used for cache * dnn(perf): enable Network tests with OpenCL * dnn(ocl4dnn/conv): drop computeGlobalSize() * dnn(ocl4dnn/conv): drop unused fields * dnn(ocl4dnn/conv): simplify ctor * dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL * dnn(ocl4dnn/conv): drop unsupported double / untested half types * dnn(ocl4dnn/conv): drop unused variable * dnn(ocl4dnn/conv): alignSize/divUp * dnn(ocl4dnn/conv): use enum values * dnn(ocl4dnn): drop unused innerproduct variable Signed-off-by: Li Peng <peng.li@intel.com> * dnn(ocl4dnn): add an generic function to check cl option support * dnn(ocl4dnn): run softmax subgroup version kernel first Signed-off-by: Li Peng <peng.li@intel.com>	8 years ago
Dmitry Kurtaev	7dc6b1d7d4	Layers for OpenFace face recognition network	8 years ago
Maksim Shabunin	248e2c7d47	Fixed some issues found by static analysis	8 years ago
Vadim Pisarevsky	0488d9bdb2	optimize out scaleLayer & concatLayer whenever possible fixed problem in concat layer by disabling memory re-use in layers with multiple inputs trying to fix the tests when Halide is used to run deep nets another attempt to fix Halide tests see if the Halide tests will pass with concat layer fusion turned off trying to fix failures in halide tests; another try one more experiment to make halide_concat & halide_enet tests pass continue attempts to fix halide tests moving on uncomment parallel concat layer seemingly fixed failures in Halide tests and re-enabled concat layer fusion; thanks to dkurt for the patch	8 years ago
Alexander Alekhin	ed10383359	dnn: added trace macros	8 years ago
Vadim Pisarevsky	8b3d6603d5	another round of dnn optimization (#9011 An error occurred ) * another round of dnn optimization: * increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly * improved SIMD optimization of pooling layer, optimized average pooling * cleaned up convolution layer implementation * made activation layer "attacheable" to all other layers, including fully connected and addition layer. * fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology. * greatly optimized permutation layer, which improved SSD performance * parallelized element-wise binary/ternary/... ops (sum, prod, max) * also, added missing copyrights to many of the layer implementation files * temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders	8 years ago
Alexander Alekhin	93729784bb	dnn: move module from opencv_contrib `e6f63c7a38/modules/dnn`	8 years ago

45 Commits (7b582b71ba8b53abb930dd7afab2e386dcddccc0)