dnn: add layer normalization for vision transformers
* add layer norm onnx parser, impl and tests
* add onnx graph simplifier for layer norm expanded
* handle the case when constants are of type Initializer
* add test case for layer norm expanded with initializers
* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16
* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull
* template hasBias
* remove trailing whitespace
* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt
* refactor implementation, optimize parallel_for
* disable layer norm expanded
* remove the removal of layer norm optional outputs
DNN: let Quant and Dequant of ONNX_importer support the Constant input.
* let Quant and Dequant support the Constant input.
* fix negative value of axis.
Reimplementation of Element-wise layers with broadcasting support
* init
* semi-working initial version
* add small_vector
* wip
* remove smallvec
* add nary function
* replace auto with Mat in lambda expr used in transform
* uncomment asserts
* autobuffer shape_buf & step_buf
* fix a missing bracket
* fixed a missing addLayer in parseElementWise
* solve one-dimensional broadcast
* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called
* one autobuffer for step & shape
* temporal fix for the missing original dimension information
* fix parseUnsqueeze when it gets a 1d tensor constant
* support sum/mean/min/max with only one input
* reuse old code to handle cases of two non-constant inputs
* add condition to handle div & mul of two non-constant inputs
* use || instead of or
* remove trainling spaces
* enlarge buf in binary_forward to contain other buffer
* use autobuffer in nary_forward
* generate data randomly and add more cases for perf
* add op and, or & xor
* update perf_dnn
* remove some comments
* remove legacy; add two ONNX conformance tests in filter
* move from cpu_denylist to all_denylist
* adjust parsing for inputs>=2
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
Add per_tensor_quantize to int8 quantize
* add per_tensor_quantize to dnn int8 module.
* change api flag from perTensor to perChannel, and recognize quantize type and onnx importer.
* change the default to hpp
Fix issue 22015, let Clip layer support 1-3 inputs
* Fix issue 22015.
Let layer Clip support 1-3 inputs.
* Resolve other problems caused by modifications
* Update onnx_importer.cpp
added extra checks to min/max handling in Clip
* Add assertions to check the size of the input
* Add test for clip with min and max initializers
* Separate test for "clip_init_min_max". Change the check method for input_size to provide a clearer message in case of problem.
* Add tests for clip with min or max initializers
* Change the implementation of getting input
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
Fix LSTM support in ONNX
* fix LSTM and add peephole support
* disable old tests
* turn lambdas into functions
* more hacks for c++98
* add assertions
* slice fixes
* backport of cuda-related fixes
* address review comments
Use YuNet of fixed input shape to fix not-supported-dynamic-zero-shape for FaceDetectorYN
* use yunet with input of fixed shape
* update yunet used in face recognition regression