ht_dec.c: Improve MSVC arm64 popcount performance #24205
Use NEON instructions for ARM64 (implementation based on https://github.com/microsoft/STL/pull/2127, which is Apache licensed).
Godbolt output here: https://godbolt.org/z/q7GPTqT14
Related patch to openjpeg: https://github.com/uclouvain/openjpeg/pull/1479
### Pull Request Readiness Checklist
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47342
The read overflow triggered by reading `src[j]` in
```cpp
for (j = 0; j < max; ++j) {
dst[j] = src[j];
}
```
The max is calculated as `new_comps[pcol].w * new_comps[pcol].h`, however the `src = old_comps[cmp].data;` which may have different `w` and `h` dimensions.