DNN: reduce the memory used in convolution layer
* reduce the memory in winograd and disabel the test when usage memory is larger than 2gb.
* remove VERY_LOG tag
[teset data in opencv_extra](https://github.com/opencv/opencv_extra/pull/1016)
NanoTrack is an extremely lightweight and fast object-tracking model.
The total size is **1.1 MB**.
And the FPS on M1 chip is **150**, on Raspberry Pi 4 is about **30**. (Float32 CPU only)
With this model, many users can run object tracking on the edge device.
The author of NanoTrack is @HonglinChu.
The original repo is https://github.com/HonglinChu/NanoTrack.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
The current implementation overwrites the result rotation and translation in every iteration.
If SOLVEPNP_ITERATIVE was run as a refinement it will start from the incorrect initial
transformation thus degrading the final outcome.
Modify the SIMD loop in color_hsv.
* Modify the SIMD loops in color_hsv.
* Add FP supporting in bit logic.
* Add temporary compatibility code.
* Use max_nlanes instead of vlanes for array declaration.
* Use "CV_SIMD || CV_SIMD_SCALABLE".
* Revert the modify of the Universal Intrinsic API
* Fix warnings.
* Use v_select instead of bits manipulation.