3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167
# Summary
### Previous context
From PR #24556:
>> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
>> Reference documentation is here:
>>
>> * https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html
>> * https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html
>> * https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html
>> * Carotene library is turned on here: 8bbf08f0de/CMakeLists.txt (L906-L911)
> As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
> All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
> This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.
### Progress
#### Part 1 (This PR)
- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)
- [x] Element-wise add and subtract
- [x] Element-wise minimum or maximum
- [x] Element-wise absolute difference
- [x] Bitwise logical operations
- [x] Element-wise compare
- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)
- [x] Integral
- [x] Threshold
- [x] WarpAffine
- [x] WarpPerspective
- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)
#### Part 2 (Next PR)
**Rough Estimate. Todo List May Change.**
- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)
- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)
- smaller remap HAL interface
- AdaptiveThreshold
- BoxFilter
- Canny
- Convert
- Filter
- GaussianBlur
- MedianBlur
- Morph
- Pyrdown
- Resize
- Scharr
- SepFilter
- Sobel
- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)
- FAST
### Performance Tests
The optimization does not contain floating point opreations.
**Absolute Difference**
Geometric mean (ms)
|Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)|
|---|:-:|:-:|:-:|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01|
**Bitwise Operation**
Geometric mean (ms)
|Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)|
|---|:-:|:-:|:-:|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55|
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
This folder contains libraries and headers of a few very popular still image codecs used by imgcodecs module. The libraries and headers are preferably to build Win32 and Win64 versions of OpenCV. On UNIX systems all the libraries are automatically detected by configure script. In order to use these versions of libraries instead of system ones on UNIX systems you should use BUILD_<library_name> CMake flags (for example, BUILD_PNG for the libpng library).
------------------------------------------------------------------------------------ libjpeg (deprecated) The Independent JPEG Group's JPEG software. Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding. See IGJ home page http://www.ijg.org for details and links to the source code
libjpeg-turbo libjpeg-turbo is covered by three compatible BSD-style open source licenses. Refer to [LICENSE.md](libjpeg-turbo/LICENSE.md) for a roll-up of license terms.
Site: https://github.com/libjpeg-turbo/libjpeg-turbo API is compatible with original libjpeg.
WITH_JPEG CMake option must be ON to add libjpeg or libjpeg-turbo support to imgcodecs. BUILD_JPEG=ON selects libjpeg-turbo by default (since OpenCV 3.4.2). Enable BUILD_JPEG_TURBO_DISABLE=ON to force using of libjpeg (this option is removed in OpenCV 4.0). SIMD instructions are enabled by default. Use ENABLE_LIBJPEG_TURBO_SIMD to control SIMD instructions. ------------------------------------------------------------------------------------ libpng Portable Network Graphics library. The license and copyright notes can be found in libpng/LICENSE. See libpng home page http://www.libpng.org for details and links to the source code
WITH_PNG CMake option must be ON to add libpng support to imgcodecs.
libspng Portable Network Graphics library. The license and copyright notes can be found in libspng/LICENSE. See libspng home page https://libspng.org for details and links to the source code
WITH_SPNG CMake option must be ON to add libspng support to imgcodecs ------------------------------------------------------------------------------------ libtiff Tag Image File Format (TIFF) Software Copyright (c) 1988-1997 Sam Leffler Copyright (c) 1991-1997 Silicon Graphics, Inc. See libtiff home page #1 http://www.simplesystems.org/libtiff/ #2 https://libtiff.gitlab.io/libtiff/ for details and links to the source code
WITH_TIFF CMake option must be ON to add libtiff & zlib support to imgcodecs. ------------------------------------------------------------------------------------ zlib General purpose LZ77 compression library Copyright (C) 1995-2022 Jean-loup Gailly and Mark Adler. See zlib home page http://www.zlib.net for details and links to the source code
zlib-ng zlib data compression library for the next generation systems (C) 1995-2013 Jean-loup Gailly and Mark Adler See zlib-ng official GitHub repository https://github.com/zlib-ng/zlib-ng.git for details and links to source code
WITH_ZLIB_NG CMake option must be ON to use zlib-ng as the zlib implementation. ------------------------------------------------------------------------------------ jasper JasPer is a collection of software (i.e., a library and application programs) for the coding and manipulation of images. This software can handle image data in a variety of formats. One such format supported by JasPer is the JPEG-2000 format defined in ISO/IEC 15444-1.
Copyright (c) 1999-2000 Image Power, Inc. Copyright (c) 1999-2000 The University of British Columbia Copyright (c) 2001-2003 Michael David Adams
See JasPer official GitHub repository https://github.com/jasper-software/jasper.git for details and links to source code ------------------------------------------------------------------------------------ openexr OpenEXR is a high dynamic-range (HDR) image file format developed by Industrial Light & Magic for use in computer imaging applications.
Copyright (c) 2006, Industrial Light & Magic, a division of Lucasfilm Entertainment Company Ltd. Portions contributed and copyright held by others as indicated. All rights reserved.
The project homepage: http://www.openexr.com ------------------------------------------------------------------------------------ ffmpeg FFmpeg is a complete, cross-platform solution to record, convert and stream audio and video. It includes libavcodec - the leading audio/video codec library, and also libavformat, libavutils and other helper libraries that are used by OpenCV (in videoio module) to read and write video files.
Copyright (c) 2001 Fabrice Bellard
The project homepage: http://ffmpeg.org/.
* On Linux/OSX we link user-installed ffmpeg (or ffmpeg fork libav). * On Windows we use pre-built ffmpeg binaries, see opencv/3rdparty/ffmpeg/readme.txt for details and licensing information ------------------------------------------------------------------------------------