ht_dec.c: Improve MSVC arm64 popcount performance #24205
Use NEON instructions for ARM64 (implementation based on https://github.com/microsoft/STL/pull/2127, which is Apache licensed).
Godbolt output here: https://godbolt.org/z/q7GPTqT14
Related patch to openjpeg: https://github.com/uclouvain/openjpeg/pull/1479
### Pull Request Readiness Checklist
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
Merge pull request #24274 from vrabaud:webp_1.3.2
Bump libwebp to 1.3.2 #24274
This is version [c1ffd9a](c1ffd9ac75)
It is 1.3.2 with a few patches that were made right after to help compilation.
No need for patches on the OpenCV side!
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
dnn: cleanup of tengine backend #24122🚀 Cleanup for OpenCV 5.0. Tengine backend is added for convolution layer speedup on ARM CPUs, but it is not maintained and the convolution layer on our default backend has reached similar performance to that of Tengine.
Tengine backend related PRs:
- https://github.com/opencv/opencv/pull/16724
- https://github.com/opencv/opencv/pull/18323
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47342
The read overflow triggered by reading `src[j]` in
```cpp
for (j = 0; j < max; ++j) {
dst[j] = src[j];
}
```
The max is calculated as `new_comps[pcol].w * new_comps[pcol].h`, however the `src = old_comps[cmp].data;` which may have different `w` and `h` dimensions.