Fully connected 0D test. #25208
This PR introduces parametrized `0/1D` input support test for `Fullyconnected` layer.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn cleanup: On-fly-quantization removal #2498
On-fly-quantization is first introduced via https://github.com/opencv/opencv/pull/20228.
We decided to remove it but keep int8 layers implementation because on-fly-quantization
is less practical given the fact that there has been so many dedicated tools for model
quantization.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: cleanup of halide backend for 5.x #24231
Merge with https://github.com/opencv/opencv_extra/pull/1092.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Use ngraph::Output in OpenVINO backend wrapper #24196
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/24102
* Use `ngraph::Output<ngraph::Node>>` insead of `std::shared_ptr<ngraph::Node>` as a backend wrapper. It lets access to multi-output nodes: 588ddf1b18/modules/dnn/src/net_openvino.cpp (L501-L504)
* All layers can be customizable with OpenVINO >= 2022.1. nGraph reference code used for default layer implementation does not required CPU plugin also (might be tested by commenting CPU plugin at `/opt/intel/openvino/runtime/lib/intel64/plugins.xml`).
* Correct inference if only intermediate blobs requested.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
TFLite models on different backends (tests and improvements) #24039
### Pull Request Readiness Checklist
* MaxUnpooling with OpenVINO
* Fully connected with transposed inputs/weights with OpenVINO
* Enable backends tests for TFLite (related to https://github.com/opencv/opencv/issues/23992#issuecomment-1640691722)
* Increase existing tests thresholds
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Resolve uncovered CUDA dnn layer #24080
### Pull Request Readiness Checklist
* Gelu activation layer on CUDA
* Try to relax GEMM from ONNX
resolves https://github.com/opencv/opencv/issues/24064
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: Support more operators in CANN backend #23401
This PR adds the support of following layers:
- [x] Sub
- [x] PRelu
- [x] DeConv
- [x] Also warn users if backend is switched back to default if some of the layers are not supported.
- [ ] [Dropped] LSTM: some hacks (adding layers) were introduced which makes it even harder to build the graph for CANN backend.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Related issue: https://github.com/opencv/opencv_zoo/issues/136
Features added:
- Support operators with multiple output: ONNX Split.
- Support Slice without steps.
Bugs fixed:
- Wrong settings in ClipByValue (Relu6).
- Wrong calculation of pads in convolution layer (It is wrong generally but only fixed specifically for CANN for now).
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* cann backend impl v1
* cann backend impl v2: use opencv parsers to build models for cann
* adjust fc according to the new transA and transB
* put cann net in cann backend node and reuse forwardLayer
* use fork() to create a child process and compile cann model
* remove legacy code
* remove debug code
* fall bcak to CPU backend if there is one layer not supoorted by CANN backend
* fix netInput forward
Add per_tensor_quantize to int8 quantize
* add per_tensor_quantize to dnn int8 module.
* change api flag from perTensor to perChannel, and recognize quantize type and onnx importer.
* change the default to hpp
* dnn: LSTM optimisation
This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications:
- Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
- Allow for weight matrices where the number of columns is not a multiple of 8.
I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
* Fix warning about initialisation order
* Remove C++11 syntax
* Fix build when AVX(2) is not available
In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
* Minor changes as requested:
- Don't check hardware support for AVX(2) when dispatch is disabled for these
- Add braces
* Fix out-of-bounds access in fully connected layer
The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
* Improve tail mask handling
- Use static array for generating tail masks (as requested)
- Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
* Revert whitespace change
* Improve readability of conditions for using AVX
* dnn(lstm): minor coding style changes, replaced left aligned load
[GSoC] OpenCV.js: Accelerate OpenCV.js DNN via WebNN
* Add WebNN backend for OpenCV DNN Module
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Add WebNN head files into OpenCV 3rd partiy files
Create webnn.hpp
update cmake
Complete README and add OpenCVDetectWebNN.cmake file
add webnn.cpp
Modify webnn.cpp
Can successfully compile the codes for creating a MLContext
Update webnn.cpp
Update README.md
Update README.md
Update README.md
Update README.md
Update cmake files and
update README.md
Update OpenCVDetectWebNN.cmake and README.md
Update OpenCVDetectWebNN.cmake
Fix OpenCVDetectWebNN.cmake and update README.md
Add source webnn_cpp.cpp and libary libwebnn_proc.so
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
update dnn.cpp
update op_webnn
update op_webnn
Update op_webnn.hpp
update op_webnn.cpp & hpp
Update op_webnn.hpp
Update op_webnn
update the skeleton
Update op_webnn.cpp
Update op_webnn
Update op_webnn.cpp
Update op_webnn.cpp
Update op_webnn.hpp
update op_webnn
update op_webnn
Solved the problems of released variables.
Fixed the bugs in op_webnn.cpp
Implement op_webnn
Implement Relu by WebNN API
Update dnn.cpp for better test
Update elementwise_layers.cpp
Implement ReLU6
Update elementwise_layers.cpp
Implement SoftMax using WebNN API
Implement Reshape by WebNN API
Implement PermuteLayer by WebNN API
Implement PoolingLayer using WebNN API
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Implement poolingLayer by WebNN API and add more detailed logs
Update dnn.cpp
Update dnn.cpp
Remove redundant codes and add more logs for poolingLayer
Add more logs in the pooling layer implementation
Fix the indent issue and resolve the compiling issue
Fix the build problems
Fix the build issue
FIx the build issue
Update dnn.cpp
Update dnn.cpp
* Fix the build issue
* Implement BatchNorm Layer by WebNN API
* Update convolution_layer.cpp
This is a temporary file for Conv2d layer implementation
* Integrate some general functions into op_webnn.cpp&hpp
* Update const_layer.cpp
* Update convolution_layer.cpp
Still have some bugs that should be fixed.
* Update conv2d layer and fc layer
still have some problems to be fixed.
* update constLayer, conv layer, fc layer
There are still some bugs to be fixed.
* Fix the build issue
* Update concat_layer.cpp
Still have some bugs to be fixed.
* Update conv2d layer, fully connected layer and const layer
* Update convolution_layer.cpp
* Add OpenCV.js DNN module WebNN Backend (both using webnn-polyfill and electron)
* Delete bib19450.aux
* Add WebNN backend for OpenCV DNN Module
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Add WebNN head files into OpenCV 3rd partiy files
Create webnn.hpp
update cmake
Complete README and add OpenCVDetectWebNN.cmake file
add webnn.cpp
Modify webnn.cpp
Can successfully compile the codes for creating a MLContext
Update webnn.cpp
Update README.md
Update README.md
Update README.md
Update README.md
Update cmake files and
update README.md
Update OpenCVDetectWebNN.cmake and README.md
Update OpenCVDetectWebNN.cmake
Fix OpenCVDetectWebNN.cmake and update README.md
Add source webnn_cpp.cpp and libary libwebnn_proc.so
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
update dnn.cpp
update op_webnn
update op_webnn
Update op_webnn.hpp
update op_webnn.cpp & hpp
Update op_webnn.hpp
Update op_webnn
update the skeleton
Update op_webnn.cpp
Update op_webnn
Update op_webnn.cpp
Update op_webnn.cpp
Update op_webnn.hpp
update op_webnn
update op_webnn
Solved the problems of released variables.
Fixed the bugs in op_webnn.cpp
Implement op_webnn
Implement Relu by WebNN API
Update dnn.cpp for better test
Update elementwise_layers.cpp
Implement ReLU6
Update elementwise_layers.cpp
Implement SoftMax using WebNN API
Implement Reshape by WebNN API
Implement PermuteLayer by WebNN API
Implement PoolingLayer using WebNN API
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Implement poolingLayer by WebNN API and add more detailed logs
Update dnn.cpp
Update dnn.cpp
Remove redundant codes and add more logs for poolingLayer
Add more logs in the pooling layer implementation
Fix the indent issue and resolve the compiling issue
Fix the build problems
Fix the build issue
FIx the build issue
Update dnn.cpp
Update dnn.cpp
* Fix the build issue
* Implement BatchNorm Layer by WebNN API
* Update convolution_layer.cpp
This is a temporary file for Conv2d layer implementation
* Integrate some general functions into op_webnn.cpp&hpp
* Update const_layer.cpp
* Update convolution_layer.cpp
Still have some bugs that should be fixed.
* Update conv2d layer and fc layer
still have some problems to be fixed.
* update constLayer, conv layer, fc layer
There are still some bugs to be fixed.
* Update conv2d layer, fully connected layer and const layer
* Update convolution_layer.cpp
* Add OpenCV.js DNN module WebNN Backend (both using webnn-polyfill and electron)
* Update dnn.cpp
* Fix Error in dnn.cpp
* Resolve duplication in conditions in convolution_layer.cpp
* Fixed the issues in the comments
* Fix building issue
* Update tutorial
* Fixed comments
* Address the comments
* Update CMakeLists.txt
* Offer more accurate perf test on native
* Add better perf tests for both native and web
* Modify per tests for better results
* Use more latest version of Electron
* Support latest WebNN Clamp op
* Add definition of HAVE_WEBNN macro
* Support group convolution
* Implement Scale_layer using WebNN
* Add Softmax option for native classification example
* Fix comments
* Fix comments
Optimization of DNN using native RISC-V vector intrinsics.
* Use RVV to optimize fastGEMM (FP32) in DNN.
* Use RVV to optimize fastGEMM1T in DNN.
* Use RVV to optimize fastConv in DNN.
* Use RVV to optimize fastDepthwiseConv in DNN.
* Vectorize tails using vl.
* Use "vl" instead of scalar to handle small block in fastConv.
* Fix memory access out of bound in "fastGEMM1T".
* Remove setvl.
* Remove useless initialization.
* Use loop unrolling to handle tail part instead of switch.
* Remove isIntel check from deep learning layers
* Remove fp16->fp32 fallbacks where it's not necessary
* Fix Kernel::run to prevent localsize > globalsize
* Remove a forward method in dnn::Layer
* Add a test
* Fix tests
* Mark multiple dnn::Layer::finalize methods as deprecated
* Replace back dnn's inputBlobs to vector of pointers
* Remove Layer::forward_fallback from CV_OCL_RUN scopes
* optimize ocl kernel enqueue in fc layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_LOG_INFO in convolution auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* update convolution IDLF kernel
extend parameter tuning range, also cleanup
ocl kernel implementation
Signed-off-by: Li Peng <peng.li@intel.com>
* update in-memory convolution cache config
fp16 and fp32 cache config are stored separately
Signed-off-by: Li Peng <peng.li@intel.com>