Merge branch 'brisk' of https://github.com/cbalint13/opencv into brisk
commit
6b1d5e48b6
183 changed files with 9874 additions and 3754 deletions
@ -0,0 +1,42 @@ |
||||
The build script is to be fixed. |
||||
Right now it assumes that 32-bit MinGW is in the system path and |
||||
64-bit mingw is installed to c:\Apps\MinGW64. |
||||
|
||||
It is important that gcc is used, not g++! |
||||
Otherwise the produced DLL will likely be dependent on libgcc_s_dw2-1.dll or similar DLL. |
||||
While we want to make the DLLs with minimum dependencies: Win32 libraries + msvcrt.dll. |
||||
|
||||
ffopencv.c is really a C++ source, hence -x c++ is used. |
||||
|
||||
How to update opencv_ffmpeg.dll and opencv_ffmpeg_64.dll when a new version of FFMPEG is release? |
||||
|
||||
1. Install 32-bit MinGW + MSYS from |
||||
http://sourceforge.net/projects/mingw/files/Automated%20MinGW%20Installer/mingw-get-inst/ |
||||
Let's assume, it's installed in C:\MSYS32. |
||||
2. Install 64-bit MinGW. http://mingw-w64.sourceforge.net/ |
||||
Let's assume, it's installed in C:\MSYS64 |
||||
3. Copy C:\MSYS32\msys to C:\MSYS64\msys. Edit C:\MSYS64\msys\etc\fstab, change C:\MSYS32 to C:\MSYS64. |
||||
|
||||
4. Now you have working MSYS32 and MSYS64 environments. |
||||
Launch, one by one, C:\MSYS32\msys\msys.bat and C:\MSYS64\msys\msys.bat to create your home directories. |
||||
|
||||
4. Download ffmpeg-x.y.z.tar.gz (where x.y.z denotes the actual ffmpeg version). |
||||
Copy it to C:\MSYS{32|64}\msys\home\<loginname> directory. |
||||
|
||||
5. To build 32-bit ffmpeg libraries, run C:\MSYS32\msys\msys.bat and type the following commands: |
||||
|
||||
5.1. tar -xzf ffmpeg-x.y.z.tar.gz |
||||
5.2. mkdir build |
||||
5.3. cd build |
||||
5.4. ../ffmpeg-x.y.z/configure --enable-w32threads |
||||
5.5. make |
||||
5.6. make install |
||||
5.7. cd /local/lib |
||||
5.8. strip -g *.a |
||||
|
||||
6. Then repeat the same for 64-bit case. The output libs: libavcodec.a etc. need to be renamed to libavcodec64.a etc. |
||||
|
||||
7. Then, copy all those libs to <opencv>\3rdparty\lib\, copy the headers to <opencv>\3rdparty\include\ffmpeg_. |
||||
|
||||
8. Then, go to <opencv>\3rdparty\ffmpeg, edit make.bat |
||||
(change paths to the actual paths to your msys32 and msys64 distributions) and then run make.bat |
@ -1,42 +1,32 @@ |
||||
The build script is to be fixed. |
||||
Right now it assumes that 32-bit MinGW is in the system path and |
||||
64-bit mingw is installed to c:\Apps\MinGW64. |
||||
|
||||
It is important that gcc is used, not g++! |
||||
Otherwise the produced DLL will likely be dependent on libgcc_s_dw2-1.dll or similar DLL. |
||||
While we want to make the DLLs with minimum dependencies: Win32 libraries + msvcrt.dll. |
||||
|
||||
ffopencv.c is really a C++ source, hence -x c++ is used. |
||||
|
||||
How to update opencv_ffmpeg.dll and opencv_ffmpeg_64.dll when a new version of FFMPEG is release? |
||||
|
||||
1. Install 32-bit MinGW + MSYS from |
||||
http://sourceforge.net/projects/mingw/files/Automated%20MinGW%20Installer/mingw-get-inst/ |
||||
Let's assume, it's installed in C:\MSYS32. |
||||
2. Install 64-bit MinGW. http://mingw-w64.sourceforge.net/ |
||||
Let's assume, it's installed in C:\MSYS64 |
||||
3. Copy C:\MSYS32\msys to C:\MSYS64\msys. Edit C:\MSYS64\msys\etc\fstab, change C:\MSYS32 to C:\MSYS64. |
||||
|
||||
4. Now you have working MSYS32 and MSYS64 environments. |
||||
Launch, one by one, C:\MSYS32\msys\msys.bat and C:\MSYS64\msys\msys.bat to create your home directories. |
||||
|
||||
4. Download ffmpeg-x.y.z.tar.gz (where x.y.z denotes the actual ffmpeg version). |
||||
Copy it to C:\MSYS{32|64}\msys\home\<loginname> directory. |
||||
|
||||
5. To build 32-bit ffmpeg libraries, run C:\MSYS32\msys\msys.bat and type the following commands: |
||||
|
||||
5.1. tar -xzf ffmpeg-x.y.z.tar.gz |
||||
5.2. mkdir build |
||||
5.3. cd build |
||||
5.4. ../ffmpeg-x.y.z/configure --enable-w32threads |
||||
5.5. make |
||||
5.6. make install |
||||
5.7. cd /local/lib |
||||
5.8. strip -g *.a |
||||
|
||||
6. Then repeat the same for 64-bit case. The output libs: libavcodec.a etc. need to be renamed to libavcodec64.a etc. |
||||
|
||||
7. Then, copy all those libs to <opencv>\3rdparty\lib\, copy the headers to <opencv>\3rdparty\include\ffmpeg_. |
||||
|
||||
8. Then, go to <opencv>\3rdparty\ffmpeg, edit make.bat |
||||
(change paths to the actual paths to your msys32 and msys64 distributions) and then run make.bat |
||||
* On Linux and other Unix flavors OpenCV uses default or user-built ffmpeg/libav libraries. |
||||
If user builds ffmpeg/libav from source and wants OpenCV to stay BSD library, not GPL/LGPL, |
||||
he/she should use --enabled-shared configure flag and make sure that no GPL components are |
||||
enabled (some notable examples are x264 (H264 encoder) and libac3 (Dolby AC3 audio codec)). |
||||
See https://www.ffmpeg.org/legal.html for details. |
||||
|
||||
If you want to play very safe and do not want to use FFMPEG at all, regardless of whether it's installed on |
||||
your system or not, configure and build OpenCV using CMake with WITH_FFMPEG=OFF flag. OpenCV will then use |
||||
AVFoundation (OSX), GStreamer (Linux) or other available backends supported by opencv_videoio module. |
||||
|
||||
There is also our self-contained motion jpeg codec, which you can use without any worries. |
||||
It handles CV_FOURCC('M', 'J', 'P', 'G') streams within an AVI container (".avi"). |
||||
|
||||
* On Windows OpenCV uses pre-built ffmpeg binaries, built with proper flags (without GPL components) and |
||||
wrapped with simple, stable OpenCV-compatible API. |
||||
The binaries are opencv_ffmpeg.dll (version for 32-bit Windows) and |
||||
opencv_ffmpeg_64.dll (version for 64-bit Windows). |
||||
|
||||
See build_win32.txt for the build instructions, if you want to rebuild opencv_ffmpeg*.dll from scratch. |
||||
|
||||
The pre-built opencv_ffmpeg*.dll is: |
||||
* LGPL library, not BSD libraries. |
||||
* Loaded at runtime by opencv_videoio module. |
||||
If it succeeds, ffmpeg can be used to decode/encode videos; |
||||
otherwise, other API is used. |
||||
|
||||
If LGPL/GPL software can not be supplied with your OpenCV-based product, simply exclude |
||||
opencv_ffmpeg*.dll from your distribution; OpenCV will stay fully functional except for the ability to |
||||
decode/encode videos using FFMPEG (though, it may still be able to do that using other API, |
||||
such as Video for Windows, Windows Media Foundation or our self-contained motion jpeg codec). |
||||
|
||||
See license.txt for the FFMPEG copyright notice and the licensing terms. |
||||
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,12 @@ |
||||
set(the_description "The Hardware Acceleration Layer (HAL) module") |
||||
|
||||
set(OPENCV_MODULE_TYPE STATIC) |
||||
# set(OPENCV_MODULE_IS_PART_OF_WORLD FALSE) |
||||
|
||||
if(UNIX) |
||||
if(CMAKE_COMPILER_IS_GNUCXX OR CV_ICC) |
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC") |
||||
endif() |
||||
endif() |
||||
|
||||
ocv_define_module(hal) |
@ -0,0 +1,98 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
|
||||
// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
|
||||
// Copyright (C) 2015, Itseez Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#ifndef __OPENCV_HAL_HPP__ |
||||
#define __OPENCV_HAL_HPP__ |
||||
|
||||
#include "opencv2/hal/defs.h" |
||||
|
||||
/**
|
||||
@defgroup hal Hardware Acceleration Layer |
||||
*/ |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
namespace Error { |
||||
|
||||
enum
|
||||
{ |
||||
Ok = 0, |
||||
Unknown = -1 |
||||
}; |
||||
|
||||
} |
||||
|
||||
int normHamming(const uchar* a, int n); |
||||
int normHamming(const uchar* a, const uchar* b, int n); |
||||
|
||||
int normHamming(const uchar* a, int n, int cellSize); |
||||
int normHamming(const uchar* a, const uchar* b, int n, int cellSize); |
||||
|
||||
//////////////////////////////// low-level functions ////////////////////////////////
|
||||
|
||||
int LU(float* A, size_t astep, int m, float* b, size_t bstep, int n); |
||||
int LU(double* A, size_t astep, int m, double* b, size_t bstep, int n); |
||||
bool Cholesky(float* A, size_t astep, int m, float* b, size_t bstep, int n); |
||||
bool Cholesky(double* A, size_t astep, int m, double* b, size_t bstep, int n); |
||||
|
||||
int normL1_(const uchar* a, const uchar* b, int n); |
||||
float normL1_(const float* a, const float* b, int n); |
||||
float normL2Sqr_(const float* a, const float* b, int n); |
||||
|
||||
void exp(const float* src, float* dst, int n); |
||||
void exp(const double* src, double* dst, int n); |
||||
void log(const float* src, float* dst, int n); |
||||
void log(const double* src, double* dst, int n); |
||||
|
||||
void fastAtan2(const float* y, const float* x, float* dst, int n, bool angleInDegrees); |
||||
void magnitude(const float* x, const float* y, float* dst, int n); |
||||
void magnitude(const double* x, const double* y, double* dst, int n); |
||||
void sqrt(const float* src, float* dst, int len); |
||||
void sqrt(const double* src, double* dst, int len); |
||||
void invSqrt(const float* src, float* dst, int len); |
||||
void invSqrt(const double* src, double* dst, int len); |
||||
|
||||
}} //cv::hal
|
||||
|
||||
#endif //__OPENCV_HAL_HPP__
|
@ -0,0 +1,675 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
|
||||
// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
|
||||
// Copyright (C) 2015, Itseez Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#ifndef __OPENCV_DEF_H__ |
||||
#define __OPENCV_DEF_H__ |
||||
|
||||
#if !defined _CRT_SECURE_NO_DEPRECATE && defined _MSC_VER && _MSC_VER > 1300 |
||||
# define _CRT_SECURE_NO_DEPRECATE /* to avoid multiple Visual Studio warnings */ |
||||
#endif |
||||
|
||||
#include <limits.h> |
||||
|
||||
#if defined __ICL |
||||
# define CV_ICC __ICL |
||||
#elif defined __ICC |
||||
# define CV_ICC __ICC |
||||
#elif defined __ECL |
||||
# define CV_ICC __ECL |
||||
#elif defined __ECC |
||||
# define CV_ICC __ECC |
||||
#elif defined __INTEL_COMPILER |
||||
# define CV_ICC __INTEL_COMPILER |
||||
#endif |
||||
|
||||
#ifndef CV_INLINE |
||||
# if defined __cplusplus |
||||
# define CV_INLINE static inline |
||||
# elif defined _MSC_VER |
||||
# define CV_INLINE __inline |
||||
# else |
||||
# define CV_INLINE static |
||||
# endif |
||||
#endif |
||||
|
||||
#if defined CV_ICC && !defined CV_ENABLE_UNROLLED |
||||
# define CV_ENABLE_UNROLLED 0 |
||||
#else |
||||
# define CV_ENABLE_UNROLLED 1 |
||||
#endif |
||||
|
||||
#ifdef __GNUC__ |
||||
# define CV_DECL_ALIGNED(x) __attribute__ ((aligned (x))) |
||||
#elif defined _MSC_VER |
||||
# define CV_DECL_ALIGNED(x) __declspec(align(x)) |
||||
#else |
||||
# define CV_DECL_ALIGNED(x) |
||||
#endif |
||||
|
||||
/* CPU features and intrinsics support */ |
||||
#define CV_CPU_NONE 0 |
||||
#define CV_CPU_MMX 1 |
||||
#define CV_CPU_SSE 2 |
||||
#define CV_CPU_SSE2 3 |
||||
#define CV_CPU_SSE3 4 |
||||
#define CV_CPU_SSSE3 5 |
||||
#define CV_CPU_SSE4_1 6 |
||||
#define CV_CPU_SSE4_2 7 |
||||
#define CV_CPU_POPCNT 8 |
||||
|
||||
#define CV_CPU_AVX 10 |
||||
#define CV_CPU_AVX2 11 |
||||
#define CV_CPU_FMA3 12 |
||||
|
||||
#define CV_CPU_AVX_512F 13 |
||||
#define CV_CPU_AVX_512BW 14 |
||||
#define CV_CPU_AVX_512CD 15 |
||||
#define CV_CPU_AVX_512DQ 16 |
||||
#define CV_CPU_AVX_512ER 17 |
||||
#define CV_CPU_AVX_512IFMA512 18 |
||||
#define CV_CPU_AVX_512PF 19 |
||||
#define CV_CPU_AVX_512VBMI 20 |
||||
#define CV_CPU_AVX_512VL 21 |
||||
|
||||
#define CV_CPU_NEON 100 |
||||
|
||||
// when adding to this list remember to update the enum in core/utility.cpp
|
||||
#define CV_HARDWARE_MAX_FEATURE 255 |
||||
|
||||
// do not include SSE/AVX/NEON headers for NVCC compiler
|
||||
#ifndef __CUDACC__ |
||||
|
||||
#if defined __SSE2__ || defined _M_X64 || (defined _M_IX86_FP && _M_IX86_FP >= 2) |
||||
# include <emmintrin.h> |
||||
# define CV_MMX 1 |
||||
# define CV_SSE 1 |
||||
# define CV_SSE2 1 |
||||
# if defined __SSE3__ || (defined _MSC_VER && _MSC_VER >= 1500) |
||||
# include <pmmintrin.h> |
||||
# define CV_SSE3 1 |
||||
# endif |
||||
# if defined __SSSE3__ || (defined _MSC_VER && _MSC_VER >= 1500) |
||||
# include <tmmintrin.h> |
||||
# define CV_SSSE3 1 |
||||
# endif |
||||
# if defined __SSE4_1__ || (defined _MSC_VER && _MSC_VER >= 1500) |
||||
# include <smmintrin.h> |
||||
# define CV_SSE4_1 1 |
||||
# endif |
||||
# if defined __SSE4_2__ || (defined _MSC_VER && _MSC_VER >= 1500) |
||||
# include <nmmintrin.h> |
||||
# define CV_SSE4_2 1 |
||||
# endif |
||||
# if defined __POPCNT__ || (defined _MSC_VER && _MSC_VER >= 1500) |
||||
# ifdef _MSC_VER |
||||
# include <nmmintrin.h> |
||||
# else |
||||
# include <popcntintrin.h> |
||||
# endif |
||||
# define CV_POPCNT 1 |
||||
# endif |
||||
# if defined __AVX__ || (defined _MSC_VER && _MSC_VER >= 1600 && 0) |
||||
// MS Visual Studio 2010 (2012?) has no macro pre-defined to identify the use of /arch:AVX
|
||||
// See: http://connect.microsoft.com/VisualStudio/feedback/details/605858/arch-avx-should-define-a-predefined-macro-in-x64-and-set-a-unique-value-for-m-ix86-fp-in-win32
|
||||
# include <immintrin.h> |
||||
# define CV_AVX 1 |
||||
# if defined(_XCR_XFEATURE_ENABLED_MASK) |
||||
# define __xgetbv() _xgetbv(_XCR_XFEATURE_ENABLED_MASK) |
||||
# else |
||||
# define __xgetbv() 0 |
||||
# endif |
||||
# endif |
||||
# if defined __AVX2__ || (defined _MSC_VER && _MSC_VER >= 1800 && 0) |
||||
# include <immintrin.h> |
||||
# define CV_AVX2 1 |
||||
# if defined __FMA__ |
||||
# define CV_FMA3 1 |
||||
# endif |
||||
# endif |
||||
#endif |
||||
|
||||
#if (defined WIN32 || defined _WIN32) && defined(_M_ARM) |
||||
# include <Intrin.h> |
||||
# include "arm_neon.h" |
||||
# define CV_NEON 1 |
||||
# define CPU_HAS_NEON_FEATURE (true) |
||||
#elif defined(__ARM_NEON__) || (defined (__ARM_NEON) && defined(__aarch64__)) |
||||
# include <arm_neon.h> |
||||
# define CV_NEON 1 |
||||
#endif |
||||
|
||||
#if defined __GNUC__ && defined __arm__ && (defined __ARM_PCS_VFP || defined __ARM_VFPV3__) |
||||
# define CV_VFP 1 |
||||
#endif |
||||
|
||||
#endif // __CUDACC__
|
||||
|
||||
#ifndef CV_POPCNT |
||||
#define CV_POPCNT 0 |
||||
#endif |
||||
#ifndef CV_MMX |
||||
# define CV_MMX 0 |
||||
#endif |
||||
#ifndef CV_SSE |
||||
# define CV_SSE 0 |
||||
#endif |
||||
#ifndef CV_SSE2 |
||||
# define CV_SSE2 0 |
||||
#endif |
||||
#ifndef CV_SSE3 |
||||
# define CV_SSE3 0 |
||||
#endif |
||||
#ifndef CV_SSSE3 |
||||
# define CV_SSSE3 0 |
||||
#endif |
||||
#ifndef CV_SSE4_1 |
||||
# define CV_SSE4_1 0 |
||||
#endif |
||||
#ifndef CV_SSE4_2 |
||||
# define CV_SSE4_2 0 |
||||
#endif |
||||
#ifndef CV_AVX |
||||
# define CV_AVX 0 |
||||
#endif |
||||
#ifndef CV_AVX2 |
||||
# define CV_AVX2 0 |
||||
#endif |
||||
#ifndef CV_FMA3 |
||||
# define CV_FMA3 0 |
||||
#endif |
||||
#ifndef CV_AVX_512F |
||||
# define CV_AVX_512F 0 |
||||
#endif |
||||
#ifndef CV_AVX_512BW |
||||
# define CV_AVX_512BW 0 |
||||
#endif |
||||
#ifndef CV_AVX_512CD |
||||
# define CV_AVX_512CD 0 |
||||
#endif |
||||
#ifndef CV_AVX_512DQ |
||||
# define CV_AVX_512DQ 0 |
||||
#endif |
||||
#ifndef CV_AVX_512ER |
||||
# define CV_AVX_512ER 0 |
||||
#endif |
||||
#ifndef CV_AVX_512IFMA512 |
||||
# define CV_AVX_512IFMA512 0 |
||||
#endif |
||||
#ifndef CV_AVX_512PF |
||||
# define CV_AVX_512PF 0 |
||||
#endif |
||||
#ifndef CV_AVX_512VBMI |
||||
# define CV_AVX_512VBMI 0 |
||||
#endif |
||||
#ifndef CV_AVX_512VL |
||||
# define CV_AVX_512VL 0 |
||||
#endif |
||||
|
||||
#ifndef CV_NEON |
||||
# define CV_NEON 0 |
||||
#endif |
||||
|
||||
#ifndef CV_VFP |
||||
# define CV_VFP 0 |
||||
#endif |
||||
|
||||
/* primitive types */ |
||||
/*
|
||||
schar - signed 1 byte integer |
||||
uchar - unsigned 1 byte integer |
||||
short - signed 2 byte integer |
||||
ushort - unsigned 2 byte integer |
||||
int - signed 4 byte integer |
||||
uint - unsigned 4 byte integer |
||||
int64 - signed 8 byte integer |
||||
uint64 - unsigned 8 byte integer |
||||
*/ |
||||
|
||||
#if !defined _MSC_VER && !defined __BORLANDC__ |
||||
# if defined __cplusplus && __cplusplus >= 201103L |
||||
# include <cstdint> |
||||
typedef std::uint32_t uint; |
||||
# else |
||||
# include <stdint.h> |
||||
typedef uint32_t uint; |
||||
# endif |
||||
#else |
||||
typedef unsigned uint; |
||||
#endif |
||||
|
||||
typedef signed char schar; |
||||
|
||||
#ifndef __IPL_H__ |
||||
typedef unsigned char uchar; |
||||
typedef unsigned short ushort; |
||||
#endif |
||||
|
||||
#if defined _MSC_VER || defined __BORLANDC__ |
||||
typedef __int64 int64; |
||||
typedef unsigned __int64 uint64; |
||||
# define CV_BIG_INT(n) n##I64 |
||||
# define CV_BIG_UINT(n) n##UI64 |
||||
#else |
||||
typedef int64_t int64; |
||||
typedef uint64_t uint64; |
||||
# define CV_BIG_INT(n) n##LL |
||||
# define CV_BIG_UINT(n) n##ULL |
||||
#endif |
||||
|
||||
/* fundamental constants */ |
||||
#define CV_PI 3.1415926535897932384626433832795 |
||||
#define CV_2PI 6.283185307179586476925286766559 |
||||
#define CV_LOG2 0.69314718055994530941723212145818 |
||||
|
||||
typedef union Cv32suf |
||||
{ |
||||
int i; |
||||
unsigned u; |
||||
float f; |
||||
} |
||||
Cv32suf; |
||||
|
||||
typedef union Cv64suf |
||||
{ |
||||
int64 i; |
||||
uint64 u; |
||||
double f; |
||||
} |
||||
Cv64suf; |
||||
|
||||
|
||||
/****************************************************************************************\
|
||||
* fast math * |
||||
\****************************************************************************************/ |
||||
|
||||
#if defined __BORLANDC__ |
||||
# include <fastmath.h> |
||||
#elif defined __cplusplus |
||||
# include <cmath> |
||||
#else |
||||
# include <math.h> |
||||
#endif |
||||
|
||||
#ifdef HAVE_TEGRA_OPTIMIZATION |
||||
# include "tegra_round.hpp" |
||||
#endif |
||||
|
||||
//! @addtogroup core_utils
|
||||
//! @{
|
||||
|
||||
#if CV_VFP |
||||
// 1. general scheme
|
||||
#define ARM_ROUND(_value, _asm_string) \ |
||||
int res; \
|
||||
float temp; \
|
||||
asm(_asm_string : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (_value)); \
|
||||
return res |
||||
// 2. version for double
|
||||
#ifdef __clang__ |
||||
#define ARM_ROUND_DBL(value) ARM_ROUND(value, "vcvtr.s32.f64 %[temp], %[value] \n vmov %[res], %[temp]") |
||||
#else |
||||
#define ARM_ROUND_DBL(value) ARM_ROUND(value, "vcvtr.s32.f64 %[temp], %P[value] \n vmov %[res], %[temp]") |
||||
#endif |
||||
// 3. version for float
|
||||
#define ARM_ROUND_FLT(value) ARM_ROUND(value, "vcvtr.s32.f32 %[temp], %[value]\n vmov %[res], %[temp]") |
||||
#endif // CV_VFP
|
||||
|
||||
/** @brief Rounds floating-point number to the nearest integer
|
||||
|
||||
@param value floating-point number. If the value is outside of INT_MIN ... INT_MAX range, the |
||||
result is not defined. |
||||
*/ |
||||
CV_INLINE int |
||||
cvRound( double value ) |
||||
{ |
||||
#if ((defined _MSC_VER && defined _M_X64) || (defined __GNUC__ && defined __x86_64__ \ |
||||
&& defined __SSE2__ && !defined __APPLE__)) && !defined(__CUDACC__) |
||||
__m128d t = _mm_set_sd( value ); |
||||
return _mm_cvtsd_si32(t); |
||||
#elif defined _MSC_VER && defined _M_IX86 |
||||
int t; |
||||
__asm |
||||
{ |
||||
fld value; |
||||
fistp t; |
||||
} |
||||
return t; |
||||
#elif ((defined _MSC_VER && defined _M_ARM) || defined CV_ICC || \ |
||||
defined __GNUC__) && defined HAVE_TEGRA_OPTIMIZATION |
||||
TEGRA_ROUND_DBL(value); |
||||
#elif defined CV_ICC || defined __GNUC__ |
||||
# if CV_VFP |
||||
ARM_ROUND_DBL(value); |
||||
# else |
||||
return (int)lrint(value); |
||||
# endif |
||||
#else |
||||
/* it's ok if round does not comply with IEEE754 standard;
|
||||
the tests should allow +/-1 difference when the tested functions use round */ |
||||
return (int)(value + (value >= 0 ? 0.5 : -0.5)); |
||||
#endif |
||||
} |
||||
|
||||
|
||||
/** @brief Rounds floating-point number to the nearest integer not larger than the original.
|
||||
|
||||
The function computes an integer i such that: |
||||
\f[i \le \texttt{value} < i+1\f] |
||||
@param value floating-point number. If the value is outside of INT_MIN ... INT_MAX range, the |
||||
result is not defined. |
||||
*/ |
||||
CV_INLINE int cvFloor( double value ) |
||||
{ |
||||
#if (defined _MSC_VER && defined _M_X64 || (defined __GNUC__ && defined __SSE2__ && !defined __APPLE__)) && !defined(__CUDACC__) |
||||
__m128d t = _mm_set_sd( value ); |
||||
int i = _mm_cvtsd_si32(t); |
||||
return i - _mm_movemask_pd(_mm_cmplt_sd(t, _mm_cvtsi32_sd(t,i))); |
||||
#elif defined __GNUC__ |
||||
int i = (int)value; |
||||
return i - (i > value); |
||||
#else |
||||
int i = cvRound(value); |
||||
float diff = (float)(value - i); |
||||
return i - (diff < 0); |
||||
#endif |
||||
} |
||||
|
||||
/** @brief Rounds floating-point number to the nearest integer not larger than the original.
|
||||
|
||||
The function computes an integer i such that: |
||||
\f[i \le \texttt{value} < i+1\f] |
||||
@param value floating-point number. If the value is outside of INT_MIN ... INT_MAX range, the |
||||
result is not defined. |
||||
*/ |
||||
CV_INLINE int cvCeil( double value ) |
||||
{ |
||||
#if (defined _MSC_VER && defined _M_X64 || (defined __GNUC__ && defined __SSE2__&& !defined __APPLE__)) && !defined(__CUDACC__) |
||||
__m128d t = _mm_set_sd( value ); |
||||
int i = _mm_cvtsd_si32(t); |
||||
return i + _mm_movemask_pd(_mm_cmplt_sd(_mm_cvtsi32_sd(t,i), t)); |
||||
#elif defined __GNUC__ |
||||
int i = (int)value; |
||||
return i + (i < value); |
||||
#else |
||||
int i = cvRound(value); |
||||
float diff = (float)(i - value); |
||||
return i + (diff < 0); |
||||
#endif |
||||
} |
||||
|
||||
/** @brief Determines if the argument is Not A Number.
|
||||
|
||||
@param value The input floating-point value |
||||
|
||||
The function returns 1 if the argument is Not A Number (as defined by IEEE754 standard), 0 |
||||
otherwise. */ |
||||
CV_INLINE int cvIsNaN( double value ) |
||||
{ |
||||
Cv64suf ieee754; |
||||
ieee754.f = value; |
||||
return ((unsigned)(ieee754.u >> 32) & 0x7fffffff) + |
||||
((unsigned)ieee754.u != 0) > 0x7ff00000; |
||||
} |
||||
|
||||
/** @brief Determines if the argument is Infinity.
|
||||
|
||||
@param value The input floating-point value |
||||
|
||||
The function returns 1 if the argument is a plus or minus infinity (as defined by IEEE754 standard) |
||||
and 0 otherwise. */ |
||||
CV_INLINE int cvIsInf( double value ) |
||||
{ |
||||
Cv64suf ieee754; |
||||
ieee754.f = value; |
||||
return ((unsigned)(ieee754.u >> 32) & 0x7fffffff) == 0x7ff00000 && |
||||
(unsigned)ieee754.u == 0; |
||||
} |
||||
|
||||
#ifdef __cplusplus |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvRound(float value) |
||||
{ |
||||
#if ((defined _MSC_VER && defined _M_X64) || (defined __GNUC__ && defined __x86_64__ && \ |
||||
defined __SSE2__ && !defined __APPLE__)) && !defined(__CUDACC__) |
||||
__m128 t = _mm_set_ss( value ); |
||||
return _mm_cvtss_si32(t); |
||||
#elif defined _MSC_VER && defined _M_IX86 |
||||
int t; |
||||
__asm |
||||
{ |
||||
fld value; |
||||
fistp t; |
||||
} |
||||
return t; |
||||
#elif ((defined _MSC_VER && defined _M_ARM) || defined CV_ICC || \ |
||||
defined __GNUC__) && defined HAVE_TEGRA_OPTIMIZATION |
||||
TEGRA_ROUND_FLT(value); |
||||
#elif defined CV_ICC || defined __GNUC__ |
||||
# if CV_VFP |
||||
ARM_ROUND_FLT(value); |
||||
# else |
||||
return (int)lrintf(value); |
||||
# endif |
||||
#else |
||||
/* it's ok if round does not comply with IEEE754 standard;
|
||||
the tests should allow +/-1 difference when the tested functions use round */ |
||||
return (int)(value + (value >= 0 ? 0.5f : -0.5f)); |
||||
#endif |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvRound( int value ) |
||||
{ |
||||
return value; |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvFloor( float value ) |
||||
{ |
||||
#if (defined _MSC_VER && defined _M_X64 || (defined __GNUC__ && defined __SSE2__ && !defined __APPLE__)) && !defined(__CUDACC__) |
||||
__m128 t = _mm_set_ss( value ); |
||||
int i = _mm_cvtss_si32(t); |
||||
return i - _mm_movemask_ps(_mm_cmplt_ss(t, _mm_cvtsi32_ss(t,i))); |
||||
#elif defined __GNUC__ |
||||
int i = (int)value; |
||||
return i - (i > value); |
||||
#else |
||||
int i = cvRound(value); |
||||
float diff = (float)(value - i); |
||||
return i - (diff < 0); |
||||
#endif |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvFloor( int value ) |
||||
{ |
||||
return value; |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvCeil( float value ) |
||||
{ |
||||
#if (defined _MSC_VER && defined _M_X64 || (defined __GNUC__ && defined __SSE2__&& !defined __APPLE__)) && !defined(__CUDACC__) |
||||
__m128 t = _mm_set_ss( value ); |
||||
int i = _mm_cvtss_si32(t); |
||||
return i + _mm_movemask_ps(_mm_cmplt_ss(_mm_cvtsi32_ss(t,i), t)); |
||||
#elif defined __GNUC__ |
||||
int i = (int)value; |
||||
return i + (i < value); |
||||
#else |
||||
int i = cvRound(value); |
||||
float diff = (float)(i - value); |
||||
return i + (diff < 0); |
||||
#endif |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvCeil( int value ) |
||||
{ |
||||
return value; |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvIsNaN( float value ) |
||||
{ |
||||
Cv32suf ieee754; |
||||
ieee754.f = value; |
||||
return (ieee754.u & 0x7fffffff) > 0x7f800000; |
||||
} |
||||
|
||||
/** @overload */ |
||||
CV_INLINE int cvIsInf( float value ) |
||||
{ |
||||
Cv32suf ieee754; |
||||
ieee754.f = value; |
||||
return (ieee754.u & 0x7fffffff) == 0x7f800000; |
||||
} |
||||
|
||||
#include <algorithm> |
||||
|
||||
namespace cv |
||||
{ |
||||
|
||||
/////////////// saturate_cast (used in image & signal processing) ///////////////////
|
||||
|
||||
/**
|
||||
Template function for accurate conversion from one primitive type to another. |
||||
|
||||
The functions saturate_cast resemble the standard C++ cast operations, such as static_cast\<T\>() |
||||
and others. They perform an efficient and accurate conversion from one primitive type to another |
||||
(see the introduction chapter). saturate in the name means that when the input value v is out of the |
||||
range of the target type, the result is not formed just by taking low bits of the input, but instead |
||||
the value is clipped. For example: |
||||
@code |
||||
uchar a = saturate_cast<uchar>(-100); // a = 0 (UCHAR_MIN)
|
||||
short b = saturate_cast<short>(33333.33333); // b = 32767 (SHRT_MAX)
|
||||
@endcode |
||||
Such clipping is done when the target type is unsigned char , signed char , unsigned short or |
||||
signed short . For 32-bit integers, no clipping is done. |
||||
|
||||
When the parameter is a floating-point value and the target type is an integer (8-, 16- or 32-bit), |
||||
the floating-point value is first rounded to the nearest integer and then clipped if needed (when |
||||
the target type is 8- or 16-bit). |
||||
|
||||
This operation is used in the simplest or most complex image processing functions in OpenCV. |
||||
|
||||
@param v Function parameter. |
||||
@sa add, subtract, multiply, divide, Mat::convertTo |
||||
*/ |
||||
template<typename _Tp> static inline _Tp saturate_cast(uchar v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(schar v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(ushort v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(short v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(unsigned v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(int v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(float v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(double v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(int64 v) { return _Tp(v); } |
||||
/** @overload */ |
||||
template<typename _Tp> static inline _Tp saturate_cast(uint64 v) { return _Tp(v); } |
||||
|
||||
//! @cond IGNORED
|
||||
|
||||
template<> inline uchar saturate_cast<uchar>(schar v) { return (uchar)std::max((int)v, 0); } |
||||
template<> inline uchar saturate_cast<uchar>(ushort v) { return (uchar)std::min((unsigned)v, (unsigned)UCHAR_MAX); } |
||||
template<> inline uchar saturate_cast<uchar>(int v) { return (uchar)((unsigned)v <= UCHAR_MAX ? v : v > 0 ? UCHAR_MAX : 0); } |
||||
template<> inline uchar saturate_cast<uchar>(short v) { return saturate_cast<uchar>((int)v); } |
||||
template<> inline uchar saturate_cast<uchar>(unsigned v) { return (uchar)std::min(v, (unsigned)UCHAR_MAX); } |
||||
template<> inline uchar saturate_cast<uchar>(float v) { int iv = cvRound(v); return saturate_cast<uchar>(iv); } |
||||
template<> inline uchar saturate_cast<uchar>(double v) { int iv = cvRound(v); return saturate_cast<uchar>(iv); } |
||||
template<> inline uchar saturate_cast<uchar>(int64 v) { return (uchar)((uint64)v <= (uint64)UCHAR_MAX ? v : v > 0 ? UCHAR_MAX : 0); } |
||||
template<> inline uchar saturate_cast<uchar>(uint64 v) { return (uchar)std::min(v, (uint64)UCHAR_MAX); } |
||||
|
||||
template<> inline schar saturate_cast<schar>(uchar v) { return (schar)std::min((int)v, SCHAR_MAX); } |
||||
template<> inline schar saturate_cast<schar>(ushort v) { return (schar)std::min((unsigned)v, (unsigned)SCHAR_MAX); } |
||||
template<> inline schar saturate_cast<schar>(int v) { return (schar)((unsigned)(v-SCHAR_MIN) <= (unsigned)UCHAR_MAX ? v : v > 0 ? SCHAR_MAX : SCHAR_MIN); } |
||||
template<> inline schar saturate_cast<schar>(short v) { return saturate_cast<schar>((int)v); } |
||||
template<> inline schar saturate_cast<schar>(unsigned v) { return (schar)std::min(v, (unsigned)SCHAR_MAX); } |
||||
template<> inline schar saturate_cast<schar>(float v) { int iv = cvRound(v); return saturate_cast<schar>(iv); } |
||||
template<> inline schar saturate_cast<schar>(double v) { int iv = cvRound(v); return saturate_cast<schar>(iv); } |
||||
template<> inline schar saturate_cast<schar>(int64 v) { return (schar)((uint64)((int64)v-SCHAR_MIN) <= (uint64)UCHAR_MAX ? v : v > 0 ? SCHAR_MAX : SCHAR_MIN); } |
||||
template<> inline schar saturate_cast<schar>(uint64 v) { return (schar)std::min(v, (uint64)SCHAR_MAX); } |
||||
|
||||
template<> inline ushort saturate_cast<ushort>(schar v) { return (ushort)std::max((int)v, 0); } |
||||
template<> inline ushort saturate_cast<ushort>(short v) { return (ushort)std::max((int)v, 0); } |
||||
template<> inline ushort saturate_cast<ushort>(int v) { return (ushort)((unsigned)v <= (unsigned)USHRT_MAX ? v : v > 0 ? USHRT_MAX : 0); } |
||||
template<> inline ushort saturate_cast<ushort>(unsigned v) { return (ushort)std::min(v, (unsigned)USHRT_MAX); } |
||||
template<> inline ushort saturate_cast<ushort>(float v) { int iv = cvRound(v); return saturate_cast<ushort>(iv); } |
||||
template<> inline ushort saturate_cast<ushort>(double v) { int iv = cvRound(v); return saturate_cast<ushort>(iv); } |
||||
template<> inline ushort saturate_cast<ushort>(int64 v) { return (ushort)((uint64)v <= (uint64)USHRT_MAX ? v : v > 0 ? USHRT_MAX : 0); } |
||||
template<> inline ushort saturate_cast<ushort>(uint64 v) { return (ushort)std::min(v, (uint64)USHRT_MAX); } |
||||
|
||||
template<> inline short saturate_cast<short>(ushort v) { return (short)std::min((int)v, SHRT_MAX); } |
||||
template<> inline short saturate_cast<short>(int v) { return (short)((unsigned)(v - SHRT_MIN) <= (unsigned)USHRT_MAX ? v : v > 0 ? SHRT_MAX : SHRT_MIN); } |
||||
template<> inline short saturate_cast<short>(unsigned v) { return (short)std::min(v, (unsigned)SHRT_MAX); } |
||||
template<> inline short saturate_cast<short>(float v) { int iv = cvRound(v); return saturate_cast<short>(iv); } |
||||
template<> inline short saturate_cast<short>(double v) { int iv = cvRound(v); return saturate_cast<short>(iv); } |
||||
template<> inline short saturate_cast<short>(int64 v) { return (short)((uint64)((int64)v - SHRT_MIN) <= (uint64)USHRT_MAX ? v : v > 0 ? SHRT_MAX : SHRT_MIN); } |
||||
template<> inline short saturate_cast<short>(uint64 v) { return (short)std::min(v, (uint64)SHRT_MAX); } |
||||
|
||||
template<> inline int saturate_cast<int>(float v) { return cvRound(v); } |
||||
template<> inline int saturate_cast<int>(double v) { return cvRound(v); } |
||||
|
||||
// we intentionally do not clip negative numbers, to make -1 become 0xffffffff etc.
|
||||
template<> inline unsigned saturate_cast<unsigned>(float v) { return cvRound(v); } |
||||
template<> inline unsigned saturate_cast<unsigned>(double v) { return cvRound(v); } |
||||
|
||||
//! @endcond
|
||||
|
||||
} |
||||
|
||||
#endif // __cplusplus
|
||||
|
||||
//! @} core_utils
|
||||
|
||||
#endif //__OPENCV_HAL_H__
|
@ -0,0 +1,292 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
|
||||
// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
|
||||
// Copyright (C) 2015, Itseez Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#ifndef __OPENCV_HAL_INTRIN_HPP__ |
||||
#define __OPENCV_HAL_INTRIN_HPP__ |
||||
|
||||
#include <cmath> |
||||
#include <float.h> |
||||
#include <stdlib.h> |
||||
|
||||
#define OPENCV_HAL_ADD(a, b) ((a) + (b)) |
||||
#define OPENCV_HAL_AND(a, b) ((a) & (b)) |
||||
#define OPENCV_HAL_NOP(a) (a) |
||||
#define OPENCV_HAL_1ST(a, b) (a) |
||||
|
||||
// unlike HAL API, which is in cv::hall,
|
||||
// we put intrinsics into cv namespace to make its
|
||||
// access from within opencv code more accessible
|
||||
namespace cv { |
||||
|
||||
template<typename _Tp> struct V_TypeTraits |
||||
{ |
||||
typedef _Tp int_type; |
||||
typedef _Tp uint_type; |
||||
typedef _Tp abs_type; |
||||
typedef _Tp sum_type; |
||||
|
||||
enum { delta = 0, shift = 0 }; |
||||
|
||||
static int_type reinterpret_int(_Tp x) { return x; } |
||||
static uint_type reinterpet_uint(_Tp x) { return x; } |
||||
static _Tp reinterpret_from_int(int_type x) { return (_Tp)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<uchar> |
||||
{ |
||||
typedef uchar value_type; |
||||
typedef schar int_type; |
||||
typedef uchar uint_type; |
||||
typedef uchar abs_type; |
||||
typedef int sum_type; |
||||
|
||||
typedef ushort w_type; |
||||
|
||||
enum { delta = 128, shift = 8 }; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<schar> |
||||
{ |
||||
typedef schar value_type; |
||||
typedef schar int_type; |
||||
typedef uchar uint_type; |
||||
typedef uchar abs_type; |
||||
typedef int sum_type; |
||||
|
||||
typedef short w_type; |
||||
|
||||
enum { delta = 128, shift = 8 }; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<ushort> |
||||
{ |
||||
typedef ushort value_type; |
||||
typedef short int_type; |
||||
typedef ushort uint_type; |
||||
typedef ushort abs_type; |
||||
typedef int sum_type; |
||||
|
||||
typedef unsigned w_type; |
||||
typedef uchar nu_type; |
||||
|
||||
enum { delta = 32768, shift = 16 }; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<short> |
||||
{ |
||||
typedef short value_type; |
||||
typedef short int_type; |
||||
typedef ushort uint_type; |
||||
typedef ushort abs_type; |
||||
typedef int sum_type; |
||||
|
||||
typedef int w_type; |
||||
typedef uchar nu_type; |
||||
typedef schar n_type; |
||||
|
||||
enum { delta = 128, shift = 8 }; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<unsigned> |
||||
{ |
||||
typedef unsigned value_type; |
||||
typedef int int_type; |
||||
typedef unsigned uint_type; |
||||
typedef unsigned abs_type; |
||||
typedef unsigned sum_type; |
||||
|
||||
typedef uint64 w_type; |
||||
typedef ushort nu_type; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<int> |
||||
{ |
||||
typedef int value_type; |
||||
typedef int int_type; |
||||
typedef unsigned uint_type; |
||||
typedef unsigned abs_type; |
||||
typedef int sum_type; |
||||
|
||||
typedef int64 w_type; |
||||
typedef short n_type; |
||||
typedef ushort nu_type; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<uint64> |
||||
{ |
||||
typedef uint64 value_type; |
||||
typedef int64 int_type; |
||||
typedef uint64 uint_type; |
||||
typedef uint64 abs_type; |
||||
typedef uint64 sum_type; |
||||
|
||||
typedef unsigned nu_type; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<int64> |
||||
{ |
||||
typedef int64 value_type; |
||||
typedef int64 int_type; |
||||
typedef uint64 uint_type; |
||||
typedef uint64 abs_type; |
||||
typedef int64 sum_type; |
||||
|
||||
typedef int nu_type; |
||||
|
||||
static int_type reinterpret_int(value_type x) { return (int_type)x; } |
||||
static uint_type reinterpret_uint(value_type x) { return (uint_type)x; } |
||||
static value_type reinterpret_from_int(int_type x) { return (value_type)x; } |
||||
}; |
||||
|
||||
|
||||
template<> struct V_TypeTraits<float> |
||||
{ |
||||
typedef float value_type; |
||||
typedef int int_type; |
||||
typedef unsigned uint_type; |
||||
typedef float abs_type; |
||||
typedef float sum_type; |
||||
|
||||
typedef double w_type; |
||||
|
||||
static int_type reinterpret_int(value_type x) |
||||
{ |
||||
Cv32suf u; |
||||
u.f = x; |
||||
return u.i; |
||||
} |
||||
static uint_type reinterpet_uint(value_type x) |
||||
{ |
||||
Cv32suf u; |
||||
u.f = x; |
||||
return u.u; |
||||
} |
||||
static value_type reinterpret_from_int(int_type x) |
||||
{ |
||||
Cv32suf u; |
||||
u.i = x; |
||||
return u.f; |
||||
} |
||||
}; |
||||
|
||||
template<> struct V_TypeTraits<double> |
||||
{ |
||||
typedef double value_type; |
||||
typedef int64 int_type; |
||||
typedef uint64 uint_type; |
||||
typedef double abs_type; |
||||
typedef double sum_type; |
||||
static int_type reinterpret_int(value_type x) |
||||
{ |
||||
Cv64suf u; |
||||
u.f = x; |
||||
return u.i; |
||||
} |
||||
static uint_type reinterpet_uint(value_type x) |
||||
{ |
||||
Cv64suf u; |
||||
u.f = x; |
||||
return u.u; |
||||
} |
||||
static value_type reinterpret_from_int(int_type x) |
||||
{ |
||||
Cv64suf u; |
||||
u.i = x; |
||||
return u.f; |
||||
} |
||||
}; |
||||
|
||||
} |
||||
|
||||
#if CV_SSE2 |
||||
|
||||
#include "opencv2/hal/intrin_sse.hpp" |
||||
|
||||
#elif CV_NEON |
||||
|
||||
#include "opencv2/hal/intrin_neon.hpp" |
||||
|
||||
#else |
||||
|
||||
#include "opencv2/hal/intrin_cpp.hpp" |
||||
|
||||
#endif |
||||
|
||||
#ifndef CV_SIMD128 |
||||
#define CV_SIMD128 0 |
||||
#endif |
||||
|
||||
#ifndef CV_SIMD128_64F |
||||
#define CV_SIMD128_64F 0 |
||||
#endif |
||||
|
||||
#endif |
@ -0,0 +1,811 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
|
||||
// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
|
||||
// Copyright (C) 2015, Itseez Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#ifndef __OPENCV_HAL_INTRIN_CPP_HPP__ |
||||
#define __OPENCV_HAL_INTRIN_CPP_HPP__ |
||||
|
||||
namespace cv |
||||
{ |
||||
|
||||
template<typename _Tp, int n> struct v_reg |
||||
{ |
||||
typedef _Tp lane_type; |
||||
typedef v_reg<typename V_TypeTraits<_Tp>::int_type, n> int_vec; |
||||
typedef v_reg<typename V_TypeTraits<_Tp>::abs_type, n> abs_vec; |
||||
enum { nlanes = n }; |
||||
|
||||
explicit v_reg(const _Tp* ptr) { for( int i = 0; i < n; i++ ) s[i] = ptr[i]; } |
||||
v_reg(_Tp s0, _Tp s1) { s[0] = s0; s[1] = s1; } |
||||
v_reg(_Tp s0, _Tp s1, _Tp s2, _Tp s3) { s[0] = s0; s[1] = s1; s[2] = s2; s[3] = s3; } |
||||
v_reg(_Tp s0, _Tp s1, _Tp s2, _Tp s3, |
||||
_Tp s4, _Tp s5, _Tp s6, _Tp s7) |
||||
{ |
||||
s[0] = s0; s[1] = s1; s[2] = s2; s[3] = s3; |
||||
s[4] = s4; s[5] = s5; s[6] = s6; s[7] = s7; |
||||
} |
||||
v_reg(_Tp s0, _Tp s1, _Tp s2, _Tp s3, |
||||
_Tp s4, _Tp s5, _Tp s6, _Tp s7, |
||||
_Tp s8, _Tp s9, _Tp s10, _Tp s11, |
||||
_Tp s12, _Tp s13, _Tp s14, _Tp s15) |
||||
{ |
||||
s[0] = s0; s[1] = s1; s[2] = s2; s[3] = s3; |
||||
s[4] = s4; s[5] = s5; s[6] = s6; s[7] = s7; |
||||
s[8] = s8; s[9] = s9; s[10] = s10; s[11] = s11; |
||||
s[12] = s12; s[13] = s13; s[14] = s14; s[15] = s15; |
||||
} |
||||
|
||||
v_reg() {} |
||||
v_reg(const v_reg<_Tp, n> & r) |
||||
{ |
||||
for( int i = 0; i < n; i++ ) |
||||
s[i] = r.s[i]; |
||||
} |
||||
|
||||
_Tp get(const int i) const { return s[i]; } |
||||
_Tp get0() const { return s[0]; } |
||||
v_reg<_Tp, n> high() const |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
int i; |
||||
for( i = 0; i < n/2; i++ ) |
||||
{ |
||||
c.s[i] = s[i+(n/2)]; |
||||
c.s[i+(n/2)] = 0; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
static v_reg<_Tp, n> zero() |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = (_Tp)0; |
||||
return c; |
||||
} |
||||
|
||||
static v_reg<_Tp, n> all(_Tp s) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = s; |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp2, int n2> v_reg<_Tp2, n2> reinterpret_as() const |
||||
{ |
||||
size_t bytes = std::min(sizeof(_Tp2)*n2, sizeof(_Tp)*n); |
||||
v_reg<_Tp2, n2> c; |
||||
memcpy(&c.s[0], &s[0], bytes); |
||||
return c; |
||||
} |
||||
|
||||
_Tp s[n]; |
||||
}; |
||||
|
||||
#define OPENCV_HAL_IMPL_BIN_OP(bin_op) \ |
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> \
|
||||
operator bin_op (const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
v_reg<_Tp, n> c; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = saturate_cast<_Tp>(a.s[i] bin_op b.s[i]); \
|
||||
return c; \
|
||||
} \
|
||||
template<typename _Tp, int n> inline v_reg<_Tp, n>& \
|
||||
operator bin_op##= (v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
a.s[i] = saturate_cast<_Tp>(a.s[i] bin_op b.s[i]); \
|
||||
return a; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_BIN_OP(+) |
||||
OPENCV_HAL_IMPL_BIN_OP(-) |
||||
OPENCV_HAL_IMPL_BIN_OP(*) |
||||
OPENCV_HAL_IMPL_BIN_OP(/) |
||||
|
||||
#define OPENCV_HAL_IMPL_BIT_OP(bit_op) \ |
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> operator bit_op \
|
||||
(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
v_reg<_Tp, n> c; \
|
||||
typedef typename V_TypeTraits<_Tp>::int_type itype; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = V_TypeTraits<_Tp>::reinterpret_from_int((itype)(V_TypeTraits<_Tp>::reinterpret_int(a.s[i]) bit_op \
|
||||
V_TypeTraits<_Tp>::reinterpret_int(b.s[i]))); \
|
||||
return c; \
|
||||
} \
|
||||
template<typename _Tp, int n> inline v_reg<_Tp, n>& operator \
|
||||
bit_op##= (v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
typedef typename V_TypeTraits<_Tp>::int_type itype; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
a.s[i] = V_TypeTraits<_Tp>::reinterpret_from_int((itype)(V_TypeTraits<_Tp>::reinterpret_int(a.s[i]) bit_op \
|
||||
V_TypeTraits<_Tp>::reinterpret_int(b.s[i]))); \
|
||||
return a; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_BIT_OP(&) |
||||
OPENCV_HAL_IMPL_BIT_OP(|) |
||||
OPENCV_HAL_IMPL_BIT_OP(^) |
||||
|
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> operator ~ (const v_reg<_Tp, n>& a) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = V_TypeTraits<_Tp>::reinterpret_from_int(~V_TypeTraits<_Tp>::reinterpret_int(a.s[i])); |
||||
return c; |
||||
} |
||||
|
||||
#define OPENCV_HAL_IMPL_MATH_FUNC(func, cfunc, _Tp2) \ |
||||
template<typename _Tp, int n> inline v_reg<_Tp2, n> func(const v_reg<_Tp, n>& a) \
|
||||
{ \
|
||||
v_reg<_Tp2, n> c; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = cfunc(a.s[i]); \
|
||||
return c; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_sqrt, std::sqrt, _Tp) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_sin, std::sin, _Tp) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_cos, std::cos, _Tp) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_exp, std::exp, _Tp) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_log, std::log, _Tp) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_abs, (typename V_TypeTraits<_Tp>::abs_type)std::abs, |
||||
typename V_TypeTraits<_Tp>::abs_type) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_round, cvRound, int) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_floor, cvFloor, int) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_ceil, cvCeil, int) |
||||
OPENCV_HAL_IMPL_MATH_FUNC(v_trunc, int, int) |
||||
|
||||
#define OPENCV_HAL_IMPL_MINMAX_FUNC(func, hfunc, cfunc) \ |
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> func(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
v_reg<_Tp, n> c; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = cfunc(a.s[i], b.s[i]); \
|
||||
return c; \
|
||||
} \
|
||||
template<typename _Tp, int n> inline _Tp hfunc(const v_reg<_Tp, n>& a) \
|
||||
{ \
|
||||
_Tp c = a.s[0]; \
|
||||
for( int i = 1; i < n; i++ ) \
|
||||
c = cfunc(c, a.s[i]); \
|
||||
return c; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_MINMAX_FUNC(v_min, v_reduce_min, std::min) |
||||
OPENCV_HAL_IMPL_MINMAX_FUNC(v_max, v_reduce_max, std::max) |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_minmax( const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b, |
||||
v_reg<_Tp, n>& minval, v_reg<_Tp, n>& maxval ) |
||||
{ |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
minval.s[i] = std::min(a.s[i], b.s[i]); |
||||
maxval.s[i] = std::max(a.s[i], b.s[i]); |
||||
} |
||||
} |
||||
|
||||
|
||||
#define OPENCV_HAL_IMPL_CMP_OP(cmp_op) \ |
||||
template<typename _Tp, int n> \
|
||||
inline v_reg<_Tp, n> operator cmp_op(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
typedef typename V_TypeTraits<_Tp>::int_type itype; \
|
||||
v_reg<_Tp, n> c; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = V_TypeTraits<_Tp>::reinterpret_from_int((itype)-(int)(a.s[i] cmp_op b.s[i])); \
|
||||
return c; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_CMP_OP(<) |
||||
OPENCV_HAL_IMPL_CMP_OP(>) |
||||
OPENCV_HAL_IMPL_CMP_OP(<=) |
||||
OPENCV_HAL_IMPL_CMP_OP(>=) |
||||
OPENCV_HAL_IMPL_CMP_OP(==) |
||||
OPENCV_HAL_IMPL_CMP_OP(!=) |
||||
|
||||
#define OPENCV_HAL_IMPL_ADD_SUB_OP(func, bin_op, cast_op, _Tp2) \ |
||||
template<typename _Tp, int n> \
|
||||
inline v_reg<_Tp2, n> func(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) \
|
||||
{ \
|
||||
typedef _Tp2 rtype; \
|
||||
v_reg<rtype, n> c; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = cast_op(a.s[i] bin_op b.s[i]); \
|
||||
return c; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_ADD_SUB_OP(v_add_wrap, +, (_Tp), _Tp) |
||||
OPENCV_HAL_IMPL_ADD_SUB_OP(v_sub_wrap, -, (_Tp), _Tp) |
||||
OPENCV_HAL_IMPL_ADD_SUB_OP(v_absdiff, -, (rtype)std::abs, typename V_TypeTraits<_Tp>::abs_type) |
||||
|
||||
template<typename _Tp, int n> |
||||
inline v_reg<_Tp, n> v_invsqrt(const v_reg<_Tp, n>& a) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = 1.f/std::sqrt(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline v_reg<_Tp, n> v_magnitude(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = std::sqrt(a.s[i]*a.s[i] + b.s[i]*b.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
|
||||
template<typename _Tp, int n> |
||||
inline v_reg<_Tp, n> v_sqr_magnitude(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = a.s[i]*a.s[i] + b.s[i]*b.s[i]; |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline v_reg<_Tp, n> v_muladd(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b, |
||||
const v_reg<_Tp, n>& c) |
||||
{ |
||||
v_reg<_Tp, n> d; |
||||
for( int i = 0; i < n; i++ ) |
||||
d.s[i] = a.s[i]*b.s[i] + c.s[i]; |
||||
return d; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<typename V_TypeTraits<_Tp>::w_type, n/2> |
||||
v_dotprod(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) |
||||
{ |
||||
typedef typename V_TypeTraits<_Tp>::w_type w_type; |
||||
v_reg<w_type, n/2> c; |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
c.s[i] = (w_type)a.s[i*2]*b.s[i*2] + (w_type)a.s[i*2+1]*b.s[i*2+1]; |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_mul_expand(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b, |
||||
v_reg<typename V_TypeTraits<_Tp>::w_type, n/2>& c, |
||||
v_reg<typename V_TypeTraits<_Tp>::w_type, n/2>& d) |
||||
{ |
||||
typedef typename V_TypeTraits<_Tp>::w_type w_type; |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
{ |
||||
c.s[i] = (w_type)a.s[i]*b.s[i]*2; |
||||
d.s[i] = (w_type)a.s[i+(n/2)]*b.s[i+(n/2)]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_hsum(const v_reg<_Tp, n>& a, |
||||
v_reg<typename V_TypeTraits<_Tp>::w_type, n/2>& c) |
||||
{ |
||||
typedef typename V_TypeTraits<_Tp>::w_type w_type; |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
{ |
||||
c.s[i] = (w_type)a.s[i*2] + a.s[i*2+1]; |
||||
} |
||||
} |
||||
|
||||
#define OPENCV_HAL_IMPL_SHIFT_OP(shift_op) \ |
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> operator shift_op(const v_reg<_Tp, n>& a, int imm) \
|
||||
{ \
|
||||
v_reg<_Tp, n> c; \
|
||||
for( int i = 0; i < n; i++ ) \
|
||||
c.s[i] = (_Tp)(a.s[i] shift_op imm); \
|
||||
return c; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_SHIFT_OP(<<) |
||||
OPENCV_HAL_IMPL_SHIFT_OP(>>) |
||||
|
||||
template<typename _Tp, int n> inline typename V_TypeTraits<_Tp>::sum_type v_reduce_sum(const v_reg<_Tp, n>& a) |
||||
{ |
||||
typename V_TypeTraits<_Tp>::sum_type c = a.s[0]; |
||||
for( int i = 1; i < n; i++ ) |
||||
c += a.s[i]; |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline int v_signmask(const v_reg<_Tp, n>& a) |
||||
{ |
||||
int mask = 0; |
||||
for( int i = 0; i < n; i++ ) |
||||
mask |= (V_TypeTraits<_Tp>::reinterpret_int(a.s[i]) < 0) << i; |
||||
return mask; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline bool v_check_all(const v_reg<_Tp, n>& a) |
||||
{ |
||||
for( int i = 0; i < n; i++ ) |
||||
if( V_TypeTraits<_Tp>::reinterpret_int(a.s[i]) >= 0 ) |
||||
return false; |
||||
return true; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline bool v_check_any(const v_reg<_Tp, n>& a) |
||||
{ |
||||
for( int i = 0; i < n; i++ ) |
||||
if( V_TypeTraits<_Tp>::reinterpret_int(a.s[i]) < 0 ) |
||||
return true; |
||||
return false; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> v_select(const v_reg<_Tp, n>& mask, |
||||
const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = V_TypeTraits<_Tp>::reinterpret_int(mask.s[i]) < 0 ? b.s[i] : a.s[i]; |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_expand(const v_reg<_Tp, n>& a, |
||||
v_reg<typename V_TypeTraits<_Tp>::w_type, n/2>& b0, |
||||
v_reg<typename V_TypeTraits<_Tp>::w_type, n/2>& b1) |
||||
{ |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
{ |
||||
b0.s[i] = a.s[i]; |
||||
b1.s[i] = a.s[i+(n/2)]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<typename V_TypeTraits<_Tp>::int_type, n> |
||||
v_reinterpret_as_int(const v_reg<_Tp, n>& a) |
||||
{ |
||||
v_reg<typename V_TypeTraits<_Tp>::int_type, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = V_TypeTraits<_Tp>::reinterpret_int(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<typename V_TypeTraits<_Tp>::uint_type, n> |
||||
v_reinterpret_as_uint(const v_reg<_Tp, n>& a) |
||||
{ |
||||
v_reg<typename V_TypeTraits<_Tp>::uint_type, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = V_TypeTraits<_Tp>::reinterpret_uint(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_zip( const v_reg<_Tp, n>& a0, const v_reg<_Tp, n>& a1, |
||||
v_reg<_Tp, n>& b0, v_reg<_Tp, n>& b1 ) |
||||
{ |
||||
int i; |
||||
for( i = 0; i < n/2; i++ ) |
||||
{ |
||||
b0.s[i*2] = a0.s[i]; |
||||
b0.s[i*2+1] = a1.s[i]; |
||||
} |
||||
for( ; i < n; i++ ) |
||||
{ |
||||
b1.s[i*2-n] = a0.s[i]; |
||||
b1.s[i*2-n+1] = a1.s[i]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> v_load(const _Tp* ptr) |
||||
{ |
||||
return v_reg<_Tp, n>(ptr); |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<_Tp, n> v_load_aligned(const _Tp* ptr) |
||||
{ |
||||
return v_reg<_Tp, n>(ptr); |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_load_halves(const _Tp* loptr, const _Tp* hiptr) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < n/2; i++ ) |
||||
{ |
||||
c.s[i] = loptr[i]; |
||||
c.s[i+n/2] = hiptr[i]; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<typename V_TypeTraits<_Tp>::w_type, n> v_load_expand(const _Tp* ptr) |
||||
{ |
||||
typedef typename V_TypeTraits<_Tp>::w_type w_type; |
||||
v_reg<w_type, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
c.s[i] = ptr[i]; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline v_reg<typename |
||||
V_TypeTraits<typename V_TypeTraits<_Tp>::w_type>::w_type, n> v_load_expand_q(const _Tp* ptr) |
||||
{ |
||||
typedef typename V_TypeTraits<typename V_TypeTraits<_Tp>::w_type>::w_type w_type; |
||||
v_reg<w_type, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
c.s[i] = ptr[i]; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_load_deinterleave(const _Tp* ptr, v_reg<_Tp, n>& a, |
||||
v_reg<_Tp, n>& b, v_reg<_Tp, n>& c) |
||||
{ |
||||
int i, i3; |
||||
for( i = i3 = 0; i < n; i++, i3 += 3 ) |
||||
{ |
||||
a.s[i] = ptr[i3]; |
||||
b.s[i] = ptr[i3+1]; |
||||
c.s[i] = ptr[i3+2]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_load_deinterleave(const _Tp* ptr, v_reg<_Tp, n>& a, |
||||
v_reg<_Tp, n>& b, v_reg<_Tp, n>& c, |
||||
v_reg<_Tp, n>& d) |
||||
{ |
||||
int i, i4; |
||||
for( i = i4 = 0; i < n; i++, i4 += 4 ) |
||||
{ |
||||
a.s[i] = ptr[i4]; |
||||
b.s[i] = ptr[i4+1]; |
||||
c.s[i] = ptr[i4+2]; |
||||
d.s[i] = ptr[i4+3]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_store_interleave( _Tp* ptr, const v_reg<_Tp, n>& a, |
||||
const v_reg<_Tp, n>& b, const v_reg<_Tp, n>& c) |
||||
{ |
||||
int i, i3; |
||||
for( i = i3 = 0; i < n; i++, i3 += 3 ) |
||||
{ |
||||
ptr[i3] = a.s[i]; |
||||
ptr[i3+1] = b.s[i]; |
||||
ptr[i3+2] = c.s[i]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> inline void v_store_interleave( _Tp* ptr, const v_reg<_Tp, n>& a, |
||||
const v_reg<_Tp, n>& b, const v_reg<_Tp, n>& c, |
||||
const v_reg<_Tp, n>& d) |
||||
{ |
||||
int i, i4; |
||||
for( i = i4 = 0; i < n; i++, i4 += 4 ) |
||||
{ |
||||
ptr[i4] = a.s[i]; |
||||
ptr[i4+1] = b.s[i]; |
||||
ptr[i4+2] = c.s[i]; |
||||
ptr[i4+3] = d.s[i]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_store(_Tp* ptr, const v_reg<_Tp, n>& a) |
||||
{ |
||||
for( int i = 0; i < n; i++ ) |
||||
ptr[i] = a.s[i]; |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_store_low(_Tp* ptr, const v_reg<_Tp, n>& a) |
||||
{ |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
ptr[i] = a.s[i]; |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_store_high(_Tp* ptr, const v_reg<_Tp, n>& a) |
||||
{ |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
ptr[i] = a.s[i+(n/2)]; |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_store_aligned(_Tp* ptr, const v_reg<_Tp, n>& a) |
||||
{ |
||||
for( int i = 0; i < n; i++ ) |
||||
ptr[i] = a.s[i]; |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline v_reg<_Tp, n> v_combine_low(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
{ |
||||
c.s[i] = a.s[i]; |
||||
c.s[i+(n/2)] = b.s[i]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline v_reg<_Tp, n> v_combine_high(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b) |
||||
{ |
||||
v_reg<_Tp, n> c; |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
{ |
||||
c.s[i] = a.s[i+(n/2)]; |
||||
c.s[i+(n/2)] = b.s[i+(n/2)]; |
||||
} |
||||
} |
||||
|
||||
template<typename _Tp, int n> |
||||
inline void v_recombine(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b, |
||||
v_reg<_Tp, n>& low, v_reg<_Tp, n>& high) |
||||
{ |
||||
for( int i = 0; i < (n/2); i++ ) |
||||
{ |
||||
low.s[i] = a.s[i]; |
||||
low.s[i+(n/2)] = b.s[i]; |
||||
high.s[i] = a.s[i+(n/2)]; |
||||
high.s[i+(n/2)] = b.s[i+(n/2)]; |
||||
} |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n> v_round(const v_reg<float, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = cvRound(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n> v_floor(const v_reg<float, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = cvFloor(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n> v_ceil(const v_reg<float, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = cvCeil(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n> v_trunc(const v_reg<float, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = (int)(a.s[i]); |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n*2> v_round(const v_reg<double, n>& a) |
||||
{ |
||||
v_reg<int, n*2> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
c.s[i] = cvRound(a.s[i]); |
||||
c.s[i+n] = 0; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n*2> v_floor(const v_reg<double, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
c.s[i] = cvFloor(a.s[i]); |
||||
c.s[i+n] = 0; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n*2> v_ceil(const v_reg<double, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
c.s[i] = cvCeil(a.s[i]); |
||||
c.s[i+n] = 0; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<int, n*2> v_trunc(const v_reg<double, n>& a) |
||||
{ |
||||
v_reg<int, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
{ |
||||
c.s[i] = cvCeil(a.s[i]); |
||||
c.s[i+n] = 0; |
||||
} |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<float, n> v_cvt_f32(const v_reg<int, n>& a) |
||||
{ |
||||
v_reg<float, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = (float)a.s[i]; |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<double, n> v_cvt_f64(const v_reg<int, n*2>& a) |
||||
{ |
||||
v_reg<double, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = (double)a.s[i]; |
||||
return c; |
||||
} |
||||
|
||||
template<int n> inline v_reg<double, n> v_cvt_f64(const v_reg<float, n*2>& a) |
||||
{ |
||||
v_reg<double, n> c; |
||||
for( int i = 0; i < n; i++ ) |
||||
c.s[i] = (double)a.s[i]; |
||||
return c; |
||||
} |
||||
|
||||
template<typename _Tp> |
||||
inline void v_transpose4x4( v_reg<_Tp, 4>& a0, const v_reg<_Tp, 4>& a1, |
||||
const v_reg<_Tp, 4>& a2, const v_reg<_Tp, 4>& a3, |
||||
v_reg<_Tp, 4>& b0, v_reg<_Tp, 4>& b1, |
||||
v_reg<_Tp, 4>& b2, v_reg<_Tp, 4>& b3 ) |
||||
{ |
||||
b0 = v_reg<_Tp, 4>(a0.s[0], a1.s[0], a2.s[0], a3.s[0]); |
||||
b1 = v_reg<_Tp, 4>(a0.s[1], a1.s[1], a2.s[1], a3.s[1]); |
||||
b2 = v_reg<_Tp, 4>(a0.s[2], a1.s[2], a2.s[2], a3.s[2]); |
||||
b3 = v_reg<_Tp, 4>(a0.s[3], a1.s[3], a2.s[3], a3.s[3]); |
||||
} |
||||
|
||||
typedef v_reg<uchar, 16> v_uint8x16; |
||||
typedef v_reg<schar, 16> v_int8x16; |
||||
typedef v_reg<ushort, 8> v_uint16x8; |
||||
typedef v_reg<short, 8> v_int16x8; |
||||
typedef v_reg<unsigned, 4> v_uint32x4; |
||||
typedef v_reg<int, 4> v_int32x4; |
||||
typedef v_reg<float, 4> v_float32x4; |
||||
typedef v_reg<float, 8> v_float32x8; |
||||
typedef v_reg<double, 2> v_float64x2; |
||||
typedef v_reg<uint64, 2> v_uint64x2; |
||||
typedef v_reg<int64, 2> v_int64x2; |
||||
|
||||
#define OPENCV_HAL_IMPL_C_INIT(_Tpvec, _Tp, suffix) \ |
||||
inline _Tpvec v_setzero_##suffix() { return _Tpvec::zero(); } \
|
||||
inline _Tpvec v_setall_##suffix(_Tp val) { return _Tpvec::all(val); } \
|
||||
template<typename _Tp0, int n0> inline _Tpvec \
|
||||
v_reinterpret_as_##suffix(const v_reg<_Tp0, n0>& a) \
|
||||
{ return a.template reinterpret_as<_Tp, _Tpvec::nlanes>(a); } |
||||
|
||||
OPENCV_HAL_IMPL_C_INIT(v_uint8x16, uchar, u8) |
||||
OPENCV_HAL_IMPL_C_INIT(v_int8x16, schar, s8) |
||||
OPENCV_HAL_IMPL_C_INIT(v_uint16x8, ushort, u16) |
||||
OPENCV_HAL_IMPL_C_INIT(v_int16x8, short, s16) |
||||
OPENCV_HAL_IMPL_C_INIT(v_uint32x4, unsigned, u32) |
||||
OPENCV_HAL_IMPL_C_INIT(v_int32x4, int, s32) |
||||
OPENCV_HAL_IMPL_C_INIT(v_float32x4, float, f32) |
||||
OPENCV_HAL_IMPL_C_INIT(v_float64x2, double, f64) |
||||
OPENCV_HAL_IMPL_C_INIT(v_uint64x2, uint64, u64) |
||||
OPENCV_HAL_IMPL_C_INIT(v_uint64x2, int64, s64) |
||||
|
||||
#define OPENCV_HAL_IMPL_C_SHIFT(_Tpvec, _Tp) \ |
||||
template<int n> inline _Tpvec v_shl(const _Tpvec& a) \
|
||||
{ return a << n; } \
|
||||
template<int n> inline _Tpvec v_shr(const _Tpvec& a) \
|
||||
{ return a >> n; } \
|
||||
template<int n> inline _Tpvec v_rshr(const _Tpvec& a) \
|
||||
{ \
|
||||
_Tpvec c; \
|
||||
for( int i = 0; i < _Tpvec::nlanes; i++ ) \
|
||||
c.s[i] = (_Tp)((a.s[i] + ((_Tp)1 << (n - 1))) >> n); \
|
||||
return c; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_C_SHIFT(v_uint16x8, ushort) |
||||
OPENCV_HAL_IMPL_C_SHIFT(v_int16x8, short) |
||||
OPENCV_HAL_IMPL_C_SHIFT(v_uint32x4, unsigned) |
||||
OPENCV_HAL_IMPL_C_SHIFT(v_int32x4, int) |
||||
OPENCV_HAL_IMPL_C_SHIFT(v_uint64x2, uint64) |
||||
OPENCV_HAL_IMPL_C_SHIFT(v_int64x2, int64) |
||||
|
||||
|
||||
#define OPENCV_HAL_IMPL_C_PACK(_Tpvec, _Tp, _Tpnvec, _Tpn, pack_suffix) \ |
||||
inline _Tpnvec v_##pack_suffix(const _Tpvec& a, const _Tpvec& b) \
|
||||
{ \
|
||||
_Tpnvec c; \
|
||||
for( int i = 0; i < _Tpvec::nlanes; i++ ) \
|
||||
{ \
|
||||
c.s[i] = saturate_cast<_Tpn>(a.s[i]); \
|
||||
c.s[i+_Tpvec::nlanes] = saturate_cast<_Tpn>(b.s[i]); \
|
||||
} \
|
||||
return c; \
|
||||
} \
|
||||
template<int n> inline _Tpnvec v_rshr_##pack_suffix(const _Tpvec& a, const _Tpvec& b) \
|
||||
{ \
|
||||
_Tpnvec c; \
|
||||
for( int i = 0; i < _Tpvec::nlanes; i++ ) \
|
||||
{ \
|
||||
c.s[i] = saturate_cast<_Tpn>((a.s[i] + ((_Tp)1 << (n - 1))) >> n); \
|
||||
c.s[i+_Tpvec::nlanes] = saturate_cast<_Tpn>((b.s[i] + ((_Tp)1 << (n - 1))) >> n); \
|
||||
} \
|
||||
return c; \
|
||||
} \
|
||||
inline void v_##pack_suffix##_store(_Tpn* ptr, const _Tpvec& a) \
|
||||
{ \
|
||||
for( int i = 0; i < _Tpvec::nlanes; i++ ) \
|
||||
ptr[i] = saturate_cast<_Tpn>(a.s[i]); \
|
||||
} \
|
||||
template<int n> inline void v_rshr_##pack_suffix##_store(_Tpn* ptr, const _Tpvec& a) \
|
||||
{ \
|
||||
for( int i = 0; i < _Tpvec::nlanes; i++ ) \
|
||||
ptr[i] = saturate_cast<_Tpn>((a.s[i] + ((_Tp)1 << (n - 1))) >> n); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_C_PACK(v_uint16x8, ushort, v_uint8x16, uchar, pack) |
||||
OPENCV_HAL_IMPL_C_PACK(v_int16x8, short, v_int8x16, schar, pack) |
||||
OPENCV_HAL_IMPL_C_PACK(v_int16x8, short, v_uint8x16, uchar, pack_u) |
||||
OPENCV_HAL_IMPL_C_PACK(v_uint32x4, unsigned, v_uint16x8, ushort, pack) |
||||
OPENCV_HAL_IMPL_C_PACK(v_int32x4, int, v_int16x8, short, pack) |
||||
OPENCV_HAL_IMPL_C_PACK(v_int32x4, int, v_uint16x8, ushort, pack_u) |
||||
OPENCV_HAL_IMPL_C_PACK(v_uint64x2, uint64, v_uint32x4, unsigned, pack) |
||||
OPENCV_HAL_IMPL_C_PACK(v_int64x2, int64, v_int32x4, int, pack) |
||||
|
||||
inline v_float32x4 v_matmul(const v_float32x4& v, const v_float32x4& m0, |
||||
const v_float32x4& m1, const v_float32x4& m2, |
||||
const v_float32x4& m3) |
||||
{ |
||||
return v_float32x4(v.s[0]*m0.s[0] + v.s[1]*m1.s[0] + v.s[2]*m2.s[0] + v.s[3]*m3.s[0], |
||||
v.s[0]*m0.s[1] + v.s[1]*m1.s[1] + v.s[2]*m2.s[1] + v.s[3]*m3.s[1], |
||||
v.s[0]*m0.s[2] + v.s[1]*m1.s[2] + v.s[2]*m2.s[2] + v.s[3]*m3.s[2], |
||||
v.s[0]*m0.s[3] + v.s[1]*m1.s[3] + v.s[2]*m2.s[3] + v.s[3]*m3.s[3]); |
||||
} |
||||
|
||||
} |
||||
|
||||
#endif |
@ -0,0 +1,823 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
|
||||
// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
|
||||
// Copyright (C) 2015, Itseez Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#ifndef __OPENCV_HAL_INTRIN_NEON_HPP__ |
||||
#define __OPENCV_HAL_INTRIN_NEON_HPP__ |
||||
|
||||
namespace cv |
||||
{ |
||||
|
||||
#define CV_SIMD128 1 |
||||
|
||||
struct v_uint8x16 |
||||
{ |
||||
typedef uchar lane_type; |
||||
enum { nlanes = 16 }; |
||||
|
||||
v_uint8x16() {} |
||||
explicit v_uint8x16(uint8x16_t v) : val(v) {} |
||||
v_uint8x16(uchar v0, uchar v1, uchar v2, uchar v3, uchar v4, uchar v5, uchar v6, uchar v7, |
||||
uchar v8, uchar v9, uchar v10, uchar v11, uchar v12, uchar v13, uchar v14, uchar v15) |
||||
{ |
||||
uchar v[] = {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15}; |
||||
val = vld1q_u8(v); |
||||
} |
||||
uchar get0() const |
||||
{ |
||||
return vgetq_lane_u8(val, 0); |
||||
} |
||||
|
||||
uint8x16_t val; |
||||
}; |
||||
|
||||
struct v_int8x16 |
||||
{ |
||||
typedef schar lane_type; |
||||
enum { nlanes = 16 }; |
||||
|
||||
v_int8x16() {} |
||||
explicit v_int8x16(int8x16_t v) : val(v) {} |
||||
v_int8x16(schar v0, schar v1, schar v2, schar v3, schar v4, schar v5, schar v6, schar v7, |
||||
schar v8, schar v9, schar v10, schar v11, schar v12, schar v13, schar v14, schar v15) |
||||
{ |
||||
schar v[] = {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15}; |
||||
val = vld1q_s8(v); |
||||
} |
||||
schar get0() const |
||||
{ |
||||
return vgetq_lane_s8(val, 0); |
||||
} |
||||
|
||||
int8x16_t val; |
||||
}; |
||||
|
||||
struct v_uint16x8 |
||||
{ |
||||
typedef ushort lane_type; |
||||
enum { nlanes = 8 }; |
||||
|
||||
v_uint16x8() {} |
||||
explicit v_uint16x8(uint16x8_t v) : val(v) {} |
||||
v_uint16x8(ushort v0, ushort v1, ushort v2, ushort v3, ushort v4, ushort v5, ushort v6, ushort v7) |
||||
{ |
||||
ushort v[] = {v0, v1, v2, v3, v4, v5, v6, v7}; |
||||
val = vld1q_u16(v); |
||||
} |
||||
ushort get0() const |
||||
{ |
||||
return vgetq_lane_u16(val, 0); |
||||
} |
||||
|
||||
uint16x8_t val; |
||||
}; |
||||
|
||||
struct v_int16x8 |
||||
{ |
||||
typedef short lane_type; |
||||
enum { nlanes = 8 }; |
||||
|
||||
v_int16x8() {} |
||||
explicit v_int16x8(int16x8_t v) : val(v) {} |
||||
v_int16x8(short v0, short v1, short v2, short v3, short v4, short v5, short v6, short v7) |
||||
{ |
||||
short v[] = {v0, v1, v2, v3, v4, v5, v6, v7}; |
||||
val = vld1q_s16(v); |
||||
} |
||||
short get0() const |
||||
{ |
||||
return vgetq_lane_s16(val, 0); |
||||
} |
||||
|
||||
int16x8_t val; |
||||
}; |
||||
|
||||
struct v_uint32x4 |
||||
{ |
||||
typedef unsigned lane_type; |
||||
enum { nlanes = 4 }; |
||||
|
||||
v_uint32x4() {} |
||||
explicit v_uint32x4(uint32x4_t v) : val(v) {} |
||||
v_uint32x4(unsigned v0, unsigned v1, unsigned v2, unsigned v3) |
||||
{ |
||||
unsigned v[] = {v0, v1, v2, v3}; |
||||
val = vld1q_u32(v); |
||||
} |
||||
unsigned get0() const |
||||
{ |
||||
return vgetq_lane_u32(val, 0); |
||||
} |
||||
|
||||
uint32x4_t val; |
||||
}; |
||||
|
||||
struct v_int32x4 |
||||
{ |
||||
typedef int lane_type; |
||||
enum { nlanes = 4 }; |
||||
|
||||
v_int32x4() {} |
||||
explicit v_int32x4(int32x4_t v) : val(v) {} |
||||
v_int32x4(int v0, int v1, int v2, int v3) |
||||
{ |
||||
int v[] = {v0, v1, v2, v3}; |
||||
val = vld1q_s32(v); |
||||
} |
||||
int get0() const |
||||
{ |
||||
return vgetq_lane_s32(val, 0); |
||||
} |
||||
int32x4_t val; |
||||
}; |
||||
|
||||
struct v_float32x4 |
||||
{ |
||||
typedef float lane_type; |
||||
enum { nlanes = 4 }; |
||||
|
||||
v_float32x4() {} |
||||
explicit v_float32x4(float32x4_t v) : val(v) {} |
||||
v_float32x4(float v0, float v1, float v2, float v3) |
||||
{ |
||||
float v[] = {v0, v1, v2, v3}; |
||||
val = vld1q_f32(v); |
||||
} |
||||
float get0() const |
||||
{ |
||||
return vgetq_lane_f32(val, 0); |
||||
} |
||||
float32x4_t val; |
||||
}; |
||||
|
||||
struct v_uint64x2 |
||||
{ |
||||
typedef uint64 lane_type; |
||||
enum { nlanes = 2 }; |
||||
|
||||
v_uint64x2() {} |
||||
explicit v_uint64x2(uint64x2_t v) : val(v) {} |
||||
v_uint64x2(unsigned v0, unsigned v1) |
||||
{ |
||||
uint64 v[] = {v0, v1}; |
||||
val = vld1q_u64(v); |
||||
} |
||||
uint64 get0() const |
||||
{ |
||||
return vgetq_lane_u64(val, 0); |
||||
} |
||||
uint64x2_t val; |
||||
}; |
||||
|
||||
struct v_int64x2 |
||||
{ |
||||
typedef int64 lane_type; |
||||
enum { nlanes = 2 }; |
||||
|
||||
v_int64x2() {} |
||||
explicit v_int64x2(int64x2_t v) : val(v) {} |
||||
v_int64x2(int v0, int v1) |
||||
{ |
||||
int64 v[] = {v0, v1}; |
||||
val = vld1q_s64(v); |
||||
} |
||||
int64 get0() const |
||||
{ |
||||
return vgetq_lane_s64(val, 0); |
||||
} |
||||
int64x2_t val; |
||||
}; |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_INIT(_Tpv, _Tp, suffix) \ |
||||
inline v_##_Tpv v_setzero_##suffix() { return v_##_Tpv(vdupq_n_##suffix((_Tp)0)); } \
|
||||
inline v_##_Tpv v_setall_##suffix(_Tp v) { return v_##_Tpv(vdupq_n_##suffix(v)); } \
|
||||
inline _Tpv##_t vreinterpretq_##suffix##_##suffix(_Tpv##_t v) { return v; } \
|
||||
inline v_uint8x16 v_reinterpret_as_u8(const v_##_Tpv& v) { return v_uint8x16(vreinterpretq_u8_##suffix(v.val)); } \
|
||||
inline v_int8x16 v_reinterpret_as_s8(const v_##_Tpv& v) { return v_int8x16(vreinterpretq_s8_##suffix(v.val)); } \
|
||||
inline v_uint16x8 v_reinterpret_as_u16(const v_##_Tpv& v) { return v_uint16x8(vreinterpretq_u16_##suffix(v.val)); } \
|
||||
inline v_int16x8 v_reinterpret_as_s16(const v_##_Tpv& v) { return v_int16x8(vreinterpretq_s16_##suffix(v.val)); } \
|
||||
inline v_uint32x4 v_reinterpret_as_u32(const v_##_Tpv& v) { return v_uint32x4(vreinterpretq_u32_##suffix(v.val)); } \
|
||||
inline v_int32x4 v_reinterpret_as_s32(const v_##_Tpv& v) { return v_int32x4(vreinterpretq_s32_##suffix(v.val)); } \
|
||||
inline v_uint64x2 v_reinterpret_as_u64(const v_##_Tpv& v) { return v_uint64x2(vreinterpretq_u64_##suffix(v.val)); } \
|
||||
inline v_int64x2 v_reinterpret_as_s64(const v_##_Tpv& v) { return v_int64x2(vreinterpretq_s64_##suffix(v.val)); } \
|
||||
inline v_float32x4 v_reinterpret_as_f32(const v_##_Tpv& v) { return v_float32x4(vreinterpretq_f32_##suffix(v.val)); } |
||||
|
||||
OPENCV_HAL_IMPL_NEON_INIT(uint8x16, uchar, u8) |
||||
OPENCV_HAL_IMPL_NEON_INIT(int8x16, schar, s8) |
||||
OPENCV_HAL_IMPL_NEON_INIT(uint16x8, ushort, u16) |
||||
OPENCV_HAL_IMPL_NEON_INIT(int16x8, short, s16) |
||||
OPENCV_HAL_IMPL_NEON_INIT(uint32x4, unsigned, u32) |
||||
OPENCV_HAL_IMPL_NEON_INIT(int32x4, int, s32) |
||||
OPENCV_HAL_IMPL_NEON_INIT(uint64x2, uint64, u64) |
||||
OPENCV_HAL_IMPL_NEON_INIT(int64x2, int64, s64) |
||||
OPENCV_HAL_IMPL_NEON_INIT(float32x4, float, f32) |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_PACK(_Tpvec, _Tp, hreg, suffix, _Tpwvec, wsuffix, pack, op) \ |
||||
inline _Tpvec v_##pack(const _Tpwvec& a, const _Tpwvec& b) \
|
||||
{ \
|
||||
hreg a1 = vqmov##op##_##wsuffix(a.val), b1 = vqmov##op##_##wsuffix(b.val); \
|
||||
return _Tpvec(vcombine_##suffix(a1, b1)); \
|
||||
} \
|
||||
inline void v_##pack##_store(_Tp* ptr, const _Tpwvec& a) \
|
||||
{ \
|
||||
hreg a1 = vqmov##op##_##wsuffix(a.val); \
|
||||
vst1_##suffix(ptr, a1); \
|
||||
} \
|
||||
template<int n> inline \
|
||||
_Tpvec v_rshr_##pack(const _Tpwvec& a, const _Tpwvec& b) \
|
||||
{ \
|
||||
hreg a1 = vqrshr##op##_n_##wsuffix(a.val, n); \
|
||||
hreg b1 = vqrshr##op##_n_##wsuffix(b.val, n); \
|
||||
return _Tpvec(vcombine_##suffix(a1, b1)); \
|
||||
} \
|
||||
template<int n> inline \
|
||||
void v_rshr_##pack##_store(_Tp* ptr, const _Tpwvec& a) \
|
||||
{ \
|
||||
hreg a1 = vqrshr##op##_n_##wsuffix(a.val, n); \
|
||||
vst1_##suffix(ptr, a1); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_PACK(v_uint8x16, uchar, uint8x8_t, u8, v_uint16x8, u16, pack, n) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_uint8x16, uchar, uint8x8_t, u8, v_int16x8, s16, pack_u, un) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_int8x16, schar, int8x8_t, s8, v_int16x8, s16, pack, n) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_uint16x8, ushort, uint16x4_t, u16, v_uint32x4, u32, pack, n) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_uint16x8, ushort, uint16x4_t, u16, v_int32x4, s32, pack_u, un) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_int16x8, short, int16x4_t, s16, v_int32x4, s32, pack, n) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_uint32x4, unsigned, uint32x2_t, u32, v_uint64x2, u64, pack, n) |
||||
OPENCV_HAL_IMPL_NEON_PACK(v_int32x4, int, int32x2_t, s32, v_int64x2, s64, pack, n) |
||||
|
||||
inline v_float32x4 v_matmul(const v_float32x4& v, const v_float32x4& m0, |
||||
const v_float32x4& m1, const v_float32x4& m2, |
||||
const v_float32x4& m3) |
||||
{ |
||||
float32x2_t vl = vget_low_f32(v.val), vh = vget_high_f32(v.val); |
||||
float32x4_t res = vmulq_lane_f32(m0.val, vl, 0); |
||||
res = vmlaq_lane_f32(res, m1.val, vl, 1); |
||||
res = vmlaq_lane_f32(res, m2.val, vh, 0); |
||||
res = vmlaq_lane_f32(res, m3.val, vh, 1); |
||||
return v_float32x4(res); |
||||
} |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_BIN_OP(bin_op, _Tpvec, intrin) \ |
||||
inline _Tpvec operator bin_op (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ \
|
||||
return _Tpvec(intrin(a.val, b.val)); \
|
||||
} \
|
||||
inline _Tpvec& operator bin_op##= (_Tpvec& a, const _Tpvec& b) \
|
||||
{ \
|
||||
a.val = intrin(a.val, b.val); \
|
||||
return a; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_uint8x16, vqaddq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_uint8x16, vqsubq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_int8x16, vqaddq_s8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_int8x16, vqsubq_s8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_uint16x8, vqaddq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_uint16x8, vqsubq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(*, v_uint16x8, vmulq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_int16x8, vqaddq_s16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_int16x8, vqsubq_s16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(*, v_int16x8, vmulq_s16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_int32x4, vaddq_s32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_int32x4, vsubq_s32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(*, v_int32x4, vmulq_s32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_float32x4, vaddq_f32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_float32x4, vsubq_f32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(*, v_float32x4, vmulq_f32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_int64x2, vaddq_s64) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_int64x2, vsubq_s64) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(+, v_uint64x2, vaddq_u64) |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(-, v_uint64x2, vsubq_u64) |
||||
|
||||
inline v_float32x4 operator / (const v_float32x4& a, const v_float32x4& b) |
||||
{ |
||||
float32x4_t reciprocal = vrecpeq_f32(b.val); |
||||
reciprocal = vmulq_f32(vrecpsq_f32(b.val, reciprocal), reciprocal); |
||||
reciprocal = vmulq_f32(vrecpsq_f32(b.val, reciprocal), reciprocal); |
||||
return v_float32x4(vmulq_f32(a.val, reciprocal)); |
||||
} |
||||
inline v_float32x4& operator /= (v_float32x4& a, const v_float32x4& b) |
||||
{ |
||||
float32x4_t reciprocal = vrecpeq_f32(b.val); |
||||
reciprocal = vmulq_f32(vrecpsq_f32(b.val, reciprocal), reciprocal); |
||||
reciprocal = vmulq_f32(vrecpsq_f32(b.val, reciprocal), reciprocal); |
||||
a.val = vmulq_f32(a.val, reciprocal); |
||||
return a; |
||||
} |
||||
|
||||
inline void v_mul_expand(const v_int16x8& a, const v_int16x8& b, |
||||
v_int32x4& c, v_int32x4& d) |
||||
{ |
||||
c.val = vmull_s16(vget_low_s16(a.val), vget_low_s16(b.val)); |
||||
d.val = vmull_s16(vget_high_s16(a.val), vget_high_s16(b.val)); |
||||
} |
||||
|
||||
inline void v_mul_expand(const v_uint16x8& a, const v_uint16x8& b, |
||||
v_uint32x4& c, v_uint32x4& d) |
||||
{ |
||||
c.val = vmull_u16(vget_low_u16(a.val), vget_low_u16(b.val)); |
||||
d.val = vmull_u16(vget_high_u16(a.val), vget_high_u16(b.val)); |
||||
} |
||||
|
||||
inline void v_mul_expand(const v_uint32x4& a, const v_uint32x4& b, |
||||
v_uint64x2& c, v_uint64x2& d) |
||||
{ |
||||
c.val = vmull_u32(vget_low_u32(a.val), vget_low_u32(b.val)); |
||||
d.val = vmull_u32(vget_high_u32(a.val), vget_high_u32(b.val)); |
||||
} |
||||
|
||||
inline v_int32x4 v_dotprod(const v_int16x8& a, const v_int16x8& b) |
||||
{ |
||||
int32x4_t c = vmull_s16(vget_low_s16(a.val), vget_low_s16(b.val)); |
||||
int32x4_t d = vmull_s16(vget_high_s16(a.val), vget_high_s16(b.val)); |
||||
int32x4x2_t cd = vtrnq_s32(c, d); |
||||
return v_int32x4(vaddq_s32(cd.val[0], cd.val[1])); |
||||
} |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_LOGIC_OP(_Tpvec, suffix) \ |
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(&, _Tpvec, vandq_##suffix) \
|
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(|, _Tpvec, vorrq_##suffix) \
|
||||
OPENCV_HAL_IMPL_NEON_BIN_OP(^, _Tpvec, veorq_##suffix) \
|
||||
inline _Tpvec operator ~ (const _Tpvec& a) \
|
||||
{ \
|
||||
return _Tpvec(vreinterpretq_##suffix##_u8(vmvnq_u8(vreinterpretq_u8_##suffix(a.val)))); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_uint8x16, u8) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_int8x16, s8) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_uint16x8, u16) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_int16x8, s16) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_uint32x4, u32) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_int32x4, s32) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_uint64x2, u64) |
||||
OPENCV_HAL_IMPL_NEON_LOGIC_OP(v_int64x2, s64) |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_FLT_BIT_OP(bin_op, intrin) \ |
||||
inline v_float32x4 operator bin_op (const v_float32x4& a, const v_float32x4& b) \
|
||||
{ \
|
||||
return v_float32x4(vreinterpretq_f32_s32(intrin(vreinterpretq_s32_f32(a.val), vreinterpretq_s32_f32(b.val)))); \
|
||||
} \
|
||||
inline v_float32x4& operator bin_op##= (v_float32x4& a, const v_float32x4& b) \
|
||||
{ \
|
||||
a.val = vreinterpretq_f32_s32(intrin(vreinterpretq_s32_f32(a.val), vreinterpretq_s32_f32(b.val))); \
|
||||
return a; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_FLT_BIT_OP(&, vandq_s32) |
||||
OPENCV_HAL_IMPL_NEON_FLT_BIT_OP(|, vorrq_s32) |
||||
OPENCV_HAL_IMPL_NEON_FLT_BIT_OP(^, veorq_s32) |
||||
|
||||
inline v_float32x4 operator ~ (const v_float32x4& a) |
||||
{ |
||||
return v_float32x4(vreinterpretq_f32_s32(vmvnq_s32(vreinterpretq_s32_f32(a.val)))); |
||||
} |
||||
|
||||
inline v_float32x4 v_sqrt(const v_float32x4& x) |
||||
{ |
||||
float32x4_t x1 = vmaxq_f32(x.val, vdupq_n_f32(FLT_MIN)); |
||||
float32x4_t e = vrsqrteq_f32(x1); |
||||
e = vmulq_f32(vrsqrtsq_f32(vmulq_f32(x1, e), e), e); |
||||
e = vmulq_f32(vrsqrtsq_f32(vmulq_f32(x1, e), e), e); |
||||
return v_float32x4(vmulq_f32(x.val, e)); |
||||
} |
||||
|
||||
inline v_float32x4 v_invsqrt(const v_float32x4& x) |
||||
{ |
||||
float32x4_t e = vrsqrteq_f32(x.val); |
||||
e = vmulq_f32(vrsqrtsq_f32(vmulq_f32(x.val, e), e), e); |
||||
e = vmulq_f32(vrsqrtsq_f32(vmulq_f32(x.val, e), e), e); |
||||
return v_float32x4(e); |
||||
} |
||||
|
||||
inline v_float32x4 v_abs(v_float32x4 x) |
||||
{ return v_float32x4(vabsq_f32(x.val)); } |
||||
|
||||
// TODO: exp, log, sin, cos
|
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_BIN_FUNC(_Tpvec, func, intrin) \ |
||||
inline _Tpvec func(const _Tpvec& a, const _Tpvec& b) \
|
||||
{ \
|
||||
return _Tpvec(intrin(a.val, b.val)); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint8x16, v_min, vminq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint8x16, v_max, vmaxq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int8x16, v_min, vminq_s8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int8x16, v_max, vmaxq_s8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint16x8, v_min, vminq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint16x8, v_max, vmaxq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int16x8, v_min, vminq_s16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int16x8, v_max, vmaxq_s16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint32x4, v_min, vminq_u32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint32x4, v_max, vmaxq_u32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int32x4, v_min, vminq_s32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int32x4, v_max, vmaxq_s32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_float32x4, v_min, vminq_f32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_float32x4, v_max, vmaxq_f32) |
||||
|
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_INT_CMP_OP(_Tpvec, cast, suffix, not_suffix) \ |
||||
inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ return _Tpvec(cast(vceqq_##suffix(a.val, b.val))); } \
|
||||
inline _Tpvec operator != (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ return _Tpvec(cast(vmvnq_##not_suffix(vceqq_##suffix(a.val, b.val)))); } \
|
||||
inline _Tpvec operator < (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ return _Tpvec(cast(vcltq_##suffix(a.val, b.val))); } \
|
||||
inline _Tpvec operator > (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ return _Tpvec(cast(vcgtq_##suffix(a.val, b.val))); } \
|
||||
inline _Tpvec operator <= (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ return _Tpvec(cast(vcleq_##suffix(a.val, b.val))); } \
|
||||
inline _Tpvec operator >= (const _Tpvec& a, const _Tpvec& b) \
|
||||
{ return _Tpvec(cast(vcgeq_##suffix(a.val, b.val))); } |
||||
|
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_uint8x16, OPENCV_HAL_NOP, u8, u8) |
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_int8x16, vreinterpretq_s8_u8, s8, u8) |
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_uint16x8, OPENCV_HAL_NOP, u16, u16) |
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_int16x8, vreinterpretq_s16_u16, s16, u16) |
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_uint32x4, OPENCV_HAL_NOP, u32, u32) |
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_int32x4, vreinterpretq_s32_u32, s32, u32) |
||||
OPENCV_HAL_IMPL_NEON_INT_CMP_OP(v_float32x4, vreinterpretq_f32_u32, f32, u32) |
||||
|
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint8x16, v_add_wrap, vaddq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int8x16, v_add_wrap, vaddq_s8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint16x8, v_add_wrap, vaddq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int16x8, v_add_wrap, vaddq_s16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint8x16, v_sub_wrap, vsubq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int8x16, v_sub_wrap, vsubq_s8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint16x8, v_sub_wrap, vsubq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_int16x8, v_sub_wrap, vsubq_s16) |
||||
|
||||
// TODO: absdiff for signed integers
|
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint8x16, v_absdiff, vabdq_u8) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint16x8, v_absdiff, vabdq_u16) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_uint32x4, v_absdiff, vabdq_u32) |
||||
OPENCV_HAL_IMPL_NEON_BIN_FUNC(v_float32x4, v_absdiff, vabdq_f32) |
||||
|
||||
inline v_float32x4 v_magnitude(const v_float32x4& a, const v_float32x4& b) |
||||
{ |
||||
v_float32x4 x(vmlaq_f32(vmulq_f32(a.val, a.val), b.val, b.val)); |
||||
return v_sqrt(x); |
||||
} |
||||
|
||||
inline v_float32x4 v_sqr_magnitude(const v_float32x4& a, const v_float32x4& b) |
||||
{ |
||||
return v_float32x4(vmlaq_f32(vmulq_f32(a.val, a.val), b.val, b.val)); |
||||
} |
||||
|
||||
inline v_float32x4 v_muladd(const v_float32x4& a, const v_float32x4& b, const v_float32x4& c) |
||||
{ |
||||
return v_float32x4(vmlaq_f32(c.val, a.val, b.val)); |
||||
} |
||||
|
||||
// trade efficiency for convenience
|
||||
#define OPENCV_HAL_IMPL_NEON_SHIFT_OP(_Tpvec, suffix, _Tps, ssuffix) \ |
||||
inline _Tpvec operator << (const _Tpvec& a, int n) \
|
||||
{ return _Tpvec(vshlq_##suffix(a.val, vdupq_n_##ssuffix((_Tps)n))); } \
|
||||
inline _Tpvec operator >> (const _Tpvec& a, int n) \
|
||||
{ return _Tpvec(vshlq_##suffix(a.val, vdupq_n_##ssuffix((_Tps)-n))); } \
|
||||
template<int n> inline _Tpvec v_shl(const _Tpvec& a) \
|
||||
{ return _Tpvec(vshlq_n_##suffix(a.val, n)); } \
|
||||
template<int n> inline _Tpvec v_shr(const _Tpvec& a) \
|
||||
{ return _Tpvec(vshrq_n_##suffix(a.val, n)); } \
|
||||
template<int n> inline _Tpvec v_rshr(const _Tpvec& a) \
|
||||
{ return _Tpvec(vrshrq_n_##suffix(a.val, n)); } |
||||
|
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_uint8x16, u8, schar, s8) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_int8x16, s8, schar, s8) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_uint16x8, u16, short, s16) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_int16x8, s16, short, s16) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_uint32x4, u32, int, s32) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_int32x4, s32, int, s32) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_uint64x2, u64, int64, s64) |
||||
OPENCV_HAL_IMPL_NEON_SHIFT_OP(v_int64x2, s64, int64, s64) |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(_Tpvec, _Tp, suffix) \ |
||||
inline _Tpvec v_load(const _Tp* ptr) \
|
||||
{ return _Tpvec(vld1q_##suffix(ptr)); } \
|
||||
inline _Tpvec v_load_aligned(const _Tp* ptr) \
|
||||
{ return _Tpvec(vld1q_##suffix(ptr)); } \
|
||||
inline _Tpvec v_load_halves(const _Tp* ptr0, const _Tp* ptr1) \
|
||||
{ return _Tpvec(vcombine_##suffix(vld1_##suffix(ptr0), vld1_##suffix(ptr1))); } \
|
||||
inline void v_store(_Tp* ptr, const _Tpvec& a) \
|
||||
{ vst1q_##suffix(ptr, a.val); } \
|
||||
inline void v_store_aligned(_Tp* ptr, const _Tpvec& a) \
|
||||
{ vst1q_##suffix(ptr, a.val); } \
|
||||
inline void v_store_low(_Tp* ptr, const _Tpvec& a) \
|
||||
{ vst1_##suffix(ptr, vget_low_##suffix(a.val)); } \
|
||||
inline void v_store_high(_Tp* ptr, const _Tpvec& a) \
|
||||
{ vst1_##suffix(ptr, vget_high_##suffix(a.val)); } |
||||
|
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_uint8x16, uchar, u8) |
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_int8x16, schar, s8) |
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_uint16x8, ushort, u16) |
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_int16x8, short, s16) |
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_uint32x4, unsigned, u32) |
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_int32x4, int, s32) |
||||
OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_float32x4, float, f32) |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(_Tpvec, scalartype, func, scalar_func) \ |
||||
inline scalartype v_reduce_##func(const _Tpvec& a) \
|
||||
{ \
|
||||
scalartype CV_DECL_ALIGNED(16) buf[4]; \
|
||||
v_store_aligned(buf, a); \
|
||||
scalartype s0 = scalar_func(buf[0], buf[1]); \
|
||||
scalartype s1 = scalar_func(buf[2], buf[3]); \
|
||||
return scalar_func(s0, s1); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_uint32x4, unsigned, sum, OPENCV_HAL_ADD) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_uint32x4, unsigned, max, std::max) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_uint32x4, unsigned, min, std::min) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_int32x4, int, sum, OPENCV_HAL_ADD) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_int32x4, int, max, std::max) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_int32x4, int, min, std::min) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_float32x4, float, sum, OPENCV_HAL_ADD) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_float32x4, float, max, std::max) |
||||
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_float32x4, float, min, std::min) |
||||
|
||||
inline int v_signmask(const v_uint8x16& a) |
||||
{ |
||||
int8x8_t m0 = vcreate_s8(CV_BIG_UINT(0x0706050403020100)); |
||||
uint8x16_t v0 = vshlq_u8(vshrq_n_u8(a.val, 7), vcombine_s8(m0, m0)); |
||||
uint64x2_t v1 = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(v0))); |
||||
return (int)vgetq_lane_u64(v1, 0) + ((int)vgetq_lane_u64(v1, 1) << 8); |
||||
} |
||||
inline int v_signmask(const v_int8x16& a) |
||||
{ return v_signmask(v_reinterpret_as_u8(a)); } |
||||
|
||||
inline int v_signmask(const v_uint16x8& a) |
||||
{ |
||||
int16x4_t m0 = vcreate_s16(CV_BIG_UINT(0x0003000200010000)); |
||||
uint16x8_t v0 = vshlq_u16(vshrq_n_u16(a.val, 15), vcombine_s16(m0, m0)); |
||||
uint64x2_t v1 = vpaddlq_u32(vpaddlq_u16(v0)); |
||||
return (int)vgetq_lane_u64(v1, 0) + ((int)vgetq_lane_u64(v1, 1) << 4); |
||||
} |
||||
inline int v_signmask(const v_int16x8& a) |
||||
{ return v_signmask(v_reinterpret_as_u16(a)); } |
||||
|
||||
inline int v_signmask(const v_uint32x4& a) |
||||
{ |
||||
int32x2_t m0 = vcreate_s32(CV_BIG_UINT(0x0000000100000000)); |
||||
uint32x4_t v0 = vshlq_u32(vshrq_n_u32(a.val, 31), vcombine_s32(m0, m0)); |
||||
uint64x2_t v1 = vpaddlq_u32(v0); |
||||
return (int)vgetq_lane_u64(v1, 0) + ((int)vgetq_lane_u64(v1, 1) << 2); |
||||
} |
||||
inline int v_signmask(const v_int32x4& a) |
||||
{ return v_signmask(v_reinterpret_as_u32(a)); } |
||||
inline int v_signmask(const v_float32x4& a) |
||||
{ return v_signmask(v_reinterpret_as_u32(a)); } |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(_Tpvec, suffix, shift) \ |
||||
inline bool v_check_all(const v_##_Tpvec& a) \
|
||||
{ \
|
||||
_Tpvec##_t v0 = vshrq_n_##suffix(vmvnq_##suffix(a.val), shift); \
|
||||
uint64x2_t v1 = vreinterpretq_u64_##suffix(v0); \
|
||||
return (vgetq_lane_u64(v1, 0) | vgetq_lane_u64(v1, 1)) == 0; \
|
||||
} \
|
||||
inline bool v_check_any(const v_##_Tpvec& a) \
|
||||
{ \
|
||||
_Tpvec##_t v0 = vshrq_n_##suffix(a.val, shift); \
|
||||
uint64x2_t v1 = vreinterpretq_u64_##suffix(v0); \
|
||||
return (vgetq_lane_u64(v1, 0) | vgetq_lane_u64(v1, 1)) != 0; \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(uint8x16, u8, 7) |
||||
OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(uint16x8, u16, 15) |
||||
OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(uint32x4, u32, 31) |
||||
|
||||
inline bool v_check_all(const v_int8x16& a) |
||||
{ return v_check_all(v_reinterpret_as_u8(a)); } |
||||
inline bool v_check_all(const v_int16x8& a) |
||||
{ return v_check_all(v_reinterpret_as_u16(a)); } |
||||
inline bool v_check_all(const v_int32x4& a) |
||||
{ return v_check_all(v_reinterpret_as_u32(a)); } |
||||
inline bool v_check_all(const v_float32x4& a) |
||||
{ return v_check_all(v_reinterpret_as_u32(a)); } |
||||
|
||||
inline bool v_check_any(const v_int8x16& a) |
||||
{ return v_check_all(v_reinterpret_as_u8(a)); } |
||||
inline bool v_check_any(const v_int16x8& a) |
||||
{ return v_check_all(v_reinterpret_as_u16(a)); } |
||||
inline bool v_check_any(const v_int32x4& a) |
||||
{ return v_check_all(v_reinterpret_as_u32(a)); } |
||||
inline bool v_check_any(const v_float32x4& a) |
||||
{ return v_check_all(v_reinterpret_as_u32(a)); } |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_SELECT(_Tpvec, suffix, usuffix) \ |
||||
inline _Tpvec v_select(const _Tpvec& mask, const _Tpvec& a, const _Tpvec& b) \
|
||||
{ \
|
||||
return _Tpvec(vbslq_##suffix(vreinterpretq_##usuffix##_##suffix(mask.val), a.val, b.val)); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_uint8x16, u8, u8) |
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_int8x16, s8, u8) |
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_uint16x8, u16, u16) |
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_int16x8, s16, u16) |
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_uint32x4, u32, u32) |
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_int32x4, s32, u32) |
||||
OPENCV_HAL_IMPL_NEON_SELECT(v_float32x4, f32, u32) |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_EXPAND(_Tpvec, _Tpwvec, _Tp, suffix) \ |
||||
inline void v_expand(const _Tpvec& a, _Tpwvec& b0, _Tpwvec& b1) \
|
||||
{ \
|
||||
b0.val = vmovl_##suffix(vget_low_##suffix(a.val)); \
|
||||
b1.val = vmovl_##suffix(vget_high_##suffix(a.val)); \
|
||||
} \
|
||||
inline _Tpwvec v_load_expand(const _Tp* ptr) \
|
||||
{ \
|
||||
return _Tpwvec(vmovl_##suffix(vld1_##suffix(ptr))); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_EXPAND(v_uint8x16, v_uint16x8, uchar, u8) |
||||
OPENCV_HAL_IMPL_NEON_EXPAND(v_int8x16, v_int16x8, schar, s8) |
||||
OPENCV_HAL_IMPL_NEON_EXPAND(v_uint16x8, v_uint32x4, ushort, u16) |
||||
OPENCV_HAL_IMPL_NEON_EXPAND(v_int16x8, v_int32x4, short, s16) |
||||
|
||||
inline v_uint32x4 v_load_expand_q(const uchar* ptr) |
||||
{ |
||||
uint8x8_t v0 = vcreate_u8(*(unsigned*)ptr); |
||||
uint16x4_t v1 = vget_low_u16(vmovl_u8(v0)); |
||||
return v_uint32x4(vmovl_u16(v1)); |
||||
} |
||||
|
||||
inline v_int32x4 v_load_expand_q(const schar* ptr) |
||||
{ |
||||
int8x8_t v0 = vcreate_s8(*(unsigned*)ptr); |
||||
int16x4_t v1 = vget_low_s16(vmovl_s8(v0)); |
||||
return v_int32x4(vmovl_s16(v1)); |
||||
} |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_UNPACKS(_Tpvec, suffix) \ |
||||
inline void v_zip(const v_##_Tpvec& a0, const v_##_Tpvec& a1, v_##_Tpvec& b0, v_##_Tpvec& b1) \
|
||||
{ \
|
||||
_Tpvec##x2_t p = vzipq_##suffix(a0.val, a1.val); \
|
||||
b0.val = p.val[0]; \
|
||||
b1.val = p.val[1]; \
|
||||
} \
|
||||
inline v_##_Tpvec v_combine_low(const v_##_Tpvec& a, const v_##_Tpvec& b) \
|
||||
{ \
|
||||
return v_##_Tpvec(vcombine_##suffix(vget_low_##suffix(a.val), vget_low_##suffix(b.val))); \
|
||||
} \
|
||||
inline v_##_Tpvec v_combine_high(const v_##_Tpvec& a, const v_##_Tpvec& b) \
|
||||
{ \
|
||||
return v_##_Tpvec(vcombine_##suffix(vget_high_##suffix(a.val), vget_high_##suffix(b.val))); \
|
||||
} \
|
||||
inline void v_recombine(const v_##_Tpvec& a, const v_##_Tpvec& b, v_##_Tpvec& c, v_##_Tpvec& d) \
|
||||
{ \
|
||||
c.val = vcombine_##suffix(vget_low_##suffix(a.val), vget_low_##suffix(b.val)); \
|
||||
d.val = vcombine_##suffix(vget_high_##suffix(a.val), vget_high_##suffix(b.val)); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(uint8x16, u8) |
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(int8x16, s8) |
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(uint16x8, u16) |
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(int16x8, s16) |
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(uint32x4, u32) |
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(int32x4, s32) |
||||
OPENCV_HAL_IMPL_NEON_UNPACKS(float32x4, f32) |
||||
|
||||
inline v_int32x4 v_round(const v_float32x4& a) |
||||
{ |
||||
static const int32x4_t v_sign = vdupq_n_s32(1 << 31), |
||||
v_05 = vreinterpretq_s32_f32(vdupq_n_f32(0.5f)); |
||||
|
||||
int32x4_t v_addition = vorrq_s32(v_05, vandq_s32(v_sign, vreinterpretq_s32_f32(a.val))); |
||||
return v_int32x4(vcvtq_s32_f32(vaddq_f32(a.val, vreinterpretq_f32_s32(v_addition)))); |
||||
} |
||||
|
||||
inline v_int32x4 v_floor(const v_float32x4& a) |
||||
{ |
||||
int32x4_t a1 = vcvtq_s32_f32(a.val); |
||||
uint32x4_t mask = vcgtq_f32(vcvtq_f32_s32(a1), a.val); |
||||
return v_int32x4(vaddq_s32(a1, vreinterpretq_s32_u32(mask))); |
||||
} |
||||
|
||||
inline v_int32x4 v_ceil(const v_float32x4& a) |
||||
{ |
||||
int32x4_t a1 = vcvtq_s32_f32(a.val); |
||||
uint32x4_t mask = vcgtq_f32(a.val, vcvtq_f32_s32(a1)); |
||||
return v_int32x4(vsubq_s32(a1, vreinterpretq_s32_u32(mask))); |
||||
} |
||||
|
||||
inline v_int32x4 v_trunc(const v_float32x4& a) |
||||
{ return v_int32x4(vcvtq_s32_f32(a.val)); } |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(_Tpvec, suffix) \ |
||||
inline void transpose4x4(const v_##_Tpvec& a0, const v_##_Tpvec& a1, \
|
||||
const v_##_Tpvec& a2, const v_##_Tpvec& a3, \
|
||||
v_##_Tpvec& b0, v_##_Tpvec& b1, \
|
||||
v_##_Tpvec& b2, v_##_Tpvec& b3) \
|
||||
{ \
|
||||
/* m00 m01 m02 m03 */ \
|
||||
/* m10 m11 m12 m13 */ \
|
||||
/* m20 m21 m22 m23 */ \
|
||||
/* m30 m31 m32 m33 */ \
|
||||
_Tpvec##x2_t t0 = vtrnq_##suffix(a0.val, a1.val); \
|
||||
_Tpvec##x2_t t1 = vtrnq_##suffix(a2.val, a3.val); \
|
||||
/* m00 m10 m02 m12 */ \
|
||||
/* m01 m11 m03 m13 */ \
|
||||
/* m20 m30 m22 m32 */ \
|
||||
/* m21 m31 m23 m33 */ \
|
||||
b0.val = vcombine_##suffix(vget_low_##suffix(t0.val[0]), vget_low_##suffix(t1.val[0])); \
|
||||
b1.val = vcombine_##suffix(vget_low_##suffix(t0.val[1]), vget_low_##suffix(t1.val[1])); \
|
||||
b2.val = vcombine_##suffix(vget_high_##suffix(t0.val[0]), vget_high_##suffix(t1.val[0])); \
|
||||
b3.val = vcombine_##suffix(vget_high_##suffix(t0.val[1]), vget_high_##suffix(t1.val[1])); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(uint32x4, u32) |
||||
OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(int32x4, s32) |
||||
OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(float32x4, f32) |
||||
|
||||
#define OPENCV_HAL_IMPL_NEON_INTERLEAVED(_Tpvec, _Tp, suffix) \ |
||||
inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec& a, v_##_Tpvec& b, v_##_Tpvec& c) \
|
||||
{ \
|
||||
_Tpvec##x3_t v = vld3q_##suffix(ptr); \
|
||||
a.val = v.val[0]; \
|
||||
b.val = v.val[1]; \
|
||||
c.val = v.val[2]; \
|
||||
} \
|
||||
inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec& a, v_##_Tpvec& b, \
|
||||
v_##_Tpvec& c, v_##_Tpvec& d) \
|
||||
{ \
|
||||
_Tpvec##x4_t v = vld4q_##suffix(ptr); \
|
||||
a.val = v.val[0]; \
|
||||
b.val = v.val[1]; \
|
||||
c.val = v.val[2]; \
|
||||
d.val = v.val[3]; \
|
||||
} \
|
||||
inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec& a, const v_##_Tpvec& b, const v_##_Tpvec& c) \
|
||||
{ \
|
||||
_Tpvec##x3_t v; \
|
||||
v.val[0] = a.val; \
|
||||
v.val[1] = b.val; \
|
||||
v.val[2] = c.val; \
|
||||
vst3q_##suffix(ptr, v); \
|
||||
} \
|
||||
inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec& a, const v_##_Tpvec& b, \
|
||||
const v_##_Tpvec& c, const v_##_Tpvec& d) \
|
||||
{ \
|
||||
_Tpvec##x4_t v; \
|
||||
v.val[0] = a.val; \
|
||||
v.val[1] = b.val; \
|
||||
v.val[2] = c.val; \
|
||||
v.val[3] = d.val; \
|
||||
vst4q_##suffix(ptr, v); \
|
||||
} |
||||
|
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(uint8x16, uchar, u8) |
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(int8x16, schar, s8) |
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(uint16x8, ushort, u16) |
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(int16x8, short, s16) |
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(uint32x4, unsigned, u32) |
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(int32x4, int, s32) |
||||
OPENCV_HAL_IMPL_NEON_INTERLEAVED(float32x4, float, f32) |
||||
|
||||
inline v_float32x4 v_cvt_f32(const v_int32x4& a) |
||||
{ |
||||
return v_float32x4(vcvtq_f32_s32(a.val)); |
||||
} |
||||
|
||||
} |
||||
|
||||
#endif |
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,47 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
}} |
@ -0,0 +1,47 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
}} |
@ -0,0 +1,47 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
}} |
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,208 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
/****************************************************************************************\
|
||||
* LU & Cholesky implementation for small matrices * |
||||
\****************************************************************************************/ |
||||
|
||||
template<typename _Tp> static inline int |
||||
LUImpl(_Tp* A, size_t astep, int m, _Tp* b, size_t bstep, int n) |
||||
{ |
||||
int i, j, k, p = 1; |
||||
astep /= sizeof(A[0]); |
||||
bstep /= sizeof(b[0]); |
||||
|
||||
for( i = 0; i < m; i++ ) |
||||
{ |
||||
k = i; |
||||
|
||||
for( j = i+1; j < m; j++ ) |
||||
if( std::abs(A[j*astep + i]) > std::abs(A[k*astep + i]) ) |
||||
k = j; |
||||
|
||||
if( std::abs(A[k*astep + i]) < std::numeric_limits<_Tp>::epsilon() ) |
||||
return 0; |
||||
|
||||
if( k != i ) |
||||
{ |
||||
for( j = i; j < m; j++ ) |
||||
std::swap(A[i*astep + j], A[k*astep + j]); |
||||
if( b ) |
||||
for( j = 0; j < n; j++ ) |
||||
std::swap(b[i*bstep + j], b[k*bstep + j]); |
||||
p = -p; |
||||
} |
||||
|
||||
_Tp d = -1/A[i*astep + i]; |
||||
|
||||
for( j = i+1; j < m; j++ ) |
||||
{ |
||||
_Tp alpha = A[j*astep + i]*d; |
||||
|
||||
for( k = i+1; k < m; k++ ) |
||||
A[j*astep + k] += alpha*A[i*astep + k]; |
||||
|
||||
if( b ) |
||||
for( k = 0; k < n; k++ ) |
||||
b[j*bstep + k] += alpha*b[i*bstep + k]; |
||||
} |
||||
|
||||
A[i*astep + i] = -d; |
||||
} |
||||
|
||||
if( b ) |
||||
{ |
||||
for( i = m-1; i >= 0; i-- ) |
||||
for( j = 0; j < n; j++ ) |
||||
{ |
||||
_Tp s = b[i*bstep + j]; |
||||
for( k = i+1; k < m; k++ ) |
||||
s -= A[i*astep + k]*b[k*bstep + j]; |
||||
b[i*bstep + j] = s*A[i*astep + i]; |
||||
} |
||||
} |
||||
|
||||
return p; |
||||
} |
||||
|
||||
|
||||
int LU(float* A, size_t astep, int m, float* b, size_t bstep, int n) |
||||
{ |
||||
return LUImpl(A, astep, m, b, bstep, n); |
||||
} |
||||
|
||||
|
||||
int LU(double* A, size_t astep, int m, double* b, size_t bstep, int n) |
||||
{ |
||||
return LUImpl(A, astep, m, b, bstep, n); |
||||
} |
||||
|
||||
|
||||
template<typename _Tp> static inline bool |
||||
CholImpl(_Tp* A, size_t astep, int m, _Tp* b, size_t bstep, int n) |
||||
{ |
||||
_Tp* L = A; |
||||
int i, j, k; |
||||
double s; |
||||
astep /= sizeof(A[0]); |
||||
bstep /= sizeof(b[0]); |
||||
|
||||
for( i = 0; i < m; i++ ) |
||||
{ |
||||
for( j = 0; j < i; j++ ) |
||||
{ |
||||
s = A[i*astep + j]; |
||||
for( k = 0; k < j; k++ ) |
||||
s -= L[i*astep + k]*L[j*astep + k]; |
||||
L[i*astep + j] = (_Tp)(s*L[j*astep + j]); |
||||
} |
||||
s = A[i*astep + i]; |
||||
for( k = 0; k < j; k++ ) |
||||
{ |
||||
double t = L[i*astep + k]; |
||||
s -= t*t; |
||||
} |
||||
if( s < std::numeric_limits<_Tp>::epsilon() ) |
||||
return false; |
||||
L[i*astep + i] = (_Tp)(1./std::sqrt(s)); |
||||
} |
||||
|
||||
if( !b ) |
||||
return true; |
||||
|
||||
// LLt x = b
|
||||
// 1: L y = b
|
||||
// 2. Lt x = y
|
||||
|
||||
/*
|
||||
[ L00 ] y0 b0 |
||||
[ L10 L11 ] y1 = b1 |
||||
[ L20 L21 L22 ] y2 b2 |
||||
[ L30 L31 L32 L33 ] y3 b3 |
||||
|
||||
[ L00 L10 L20 L30 ] x0 y0 |
||||
[ L11 L21 L31 ] x1 = y1 |
||||
[ L22 L32 ] x2 y2 |
||||
[ L33 ] x3 y3 |
||||
*/ |
||||
|
||||
for( i = 0; i < m; i++ ) |
||||
{ |
||||
for( j = 0; j < n; j++ ) |
||||
{ |
||||
s = b[i*bstep + j]; |
||||
for( k = 0; k < i; k++ ) |
||||
s -= L[i*astep + k]*b[k*bstep + j]; |
||||
b[i*bstep + j] = (_Tp)(s*L[i*astep + i]); |
||||
} |
||||
} |
||||
|
||||
for( i = m-1; i >= 0; i-- ) |
||||
{ |
||||
for( j = 0; j < n; j++ ) |
||||
{ |
||||
s = b[i*bstep + j]; |
||||
for( k = m-1; k > i; k-- ) |
||||
s -= L[k*astep + i]*b[k*bstep + j]; |
||||
b[i*bstep + j] = (_Tp)(s*L[i*astep + i]); |
||||
} |
||||
} |
||||
|
||||
return true; |
||||
} |
||||
|
||||
|
||||
bool Cholesky(float* A, size_t astep, int m, float* b, size_t bstep, int n) |
||||
{ |
||||
return CholImpl(A, astep, m, b, bstep, n); |
||||
} |
||||
|
||||
bool Cholesky(double* A, size_t astep, int m, double* b, size_t bstep, int n) |
||||
{ |
||||
return CholImpl(A, astep, m, b, bstep, n); |
||||
} |
||||
|
||||
}} |
@ -0,0 +1,49 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "opencv2/hal.hpp" |
||||
#include "opencv2/hal/intrin.hpp" |
||||
#include <algorithm> |
||||
#include <cmath> |
||||
#include <cstdlib> |
||||
#include <limits> |
||||
#include <float.h> |
@ -0,0 +1,47 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
}} |
@ -0,0 +1,306 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
static const uchar popCountTable[] = |
||||
{ |
||||
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, |
||||
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, |
||||
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, |
||||
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, |
||||
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, |
||||
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, |
||||
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, |
||||
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8 |
||||
}; |
||||
|
||||
static const uchar popCountTable2[] = |
||||
{ |
||||
0, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, |
||||
1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, |
||||
1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, |
||||
2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, 2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, |
||||
1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, |
||||
2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, 2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, |
||||
1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, |
||||
2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, 2, 3, 3, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4 |
||||
}; |
||||
|
||||
static const uchar popCountTable4[] = |
||||
{ |
||||
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, |
||||
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 |
||||
}; |
||||
|
||||
int normHamming(const uchar* a, int n) |
||||
{ |
||||
int i = 0; |
||||
int result = 0; |
||||
#if CV_NEON |
||||
{ |
||||
uint32x4_t bits = vmovq_n_u32(0); |
||||
for (; i <= n - 16; i += 16) { |
||||
uint8x16_t A_vec = vld1q_u8 (a + i); |
||||
uint8x16_t bitsSet = vcntq_u8 (A_vec); |
||||
uint16x8_t bitSet8 = vpaddlq_u8 (bitsSet); |
||||
uint32x4_t bitSet4 = vpaddlq_u16 (bitSet8); |
||||
bits = vaddq_u32(bits, bitSet4); |
||||
} |
||||
uint64x2_t bitSet2 = vpaddlq_u32 (bits); |
||||
result = vgetq_lane_s32 (vreinterpretq_s32_u64(bitSet2),0); |
||||
result += vgetq_lane_s32 (vreinterpretq_s32_u64(bitSet2),2); |
||||
} |
||||
#endif |
||||
for( ; i <= n - 4; i += 4 ) |
||||
result += popCountTable[a[i]] + popCountTable[a[i+1]] + |
||||
popCountTable[a[i+2]] + popCountTable[a[i+3]]; |
||||
for( ; i < n; i++ ) |
||||
result += popCountTable[a[i]]; |
||||
return result; |
||||
} |
||||
|
||||
int normHamming(const uchar* a, const uchar* b, int n) |
||||
{ |
||||
int i = 0; |
||||
int result = 0; |
||||
#if CV_NEON |
||||
{ |
||||
uint32x4_t bits = vmovq_n_u32(0); |
||||
for (; i <= n - 16; i += 16) { |
||||
uint8x16_t A_vec = vld1q_u8 (a + i); |
||||
uint8x16_t B_vec = vld1q_u8 (b + i); |
||||
uint8x16_t AxorB = veorq_u8 (A_vec, B_vec); |
||||
uint8x16_t bitsSet = vcntq_u8 (AxorB); |
||||
uint16x8_t bitSet8 = vpaddlq_u8 (bitsSet); |
||||
uint32x4_t bitSet4 = vpaddlq_u16 (bitSet8); |
||||
bits = vaddq_u32(bits, bitSet4); |
||||
} |
||||
uint64x2_t bitSet2 = vpaddlq_u32 (bits); |
||||
result = vgetq_lane_s32 (vreinterpretq_s32_u64(bitSet2),0); |
||||
result += vgetq_lane_s32 (vreinterpretq_s32_u64(bitSet2),2); |
||||
} |
||||
#endif |
||||
for( ; i <= n - 4; i += 4 ) |
||||
result += popCountTable[a[i] ^ b[i]] + popCountTable[a[i+1] ^ b[i+1]] + |
||||
popCountTable[a[i+2] ^ b[i+2]] + popCountTable[a[i+3] ^ b[i+3]]; |
||||
for( ; i < n; i++ ) |
||||
result += popCountTable[a[i] ^ b[i]]; |
||||
return result; |
||||
} |
||||
|
||||
int normHamming(const uchar* a, int n, int cellSize) |
||||
{ |
||||
if( cellSize == 1 ) |
||||
return normHamming(a, n); |
||||
const uchar* tab = 0; |
||||
if( cellSize == 2 ) |
||||
tab = popCountTable2; |
||||
else if( cellSize == 4 ) |
||||
tab = popCountTable4; |
||||
else |
||||
return -1; |
||||
int i = 0; |
||||
int result = 0; |
||||
#if CV_ENABLE_UNROLLED |
||||
for( ; i <= n - 4; i += 4 ) |
||||
result += tab[a[i]] + tab[a[i+1]] + tab[a[i+2]] + tab[a[i+3]]; |
||||
#endif |
||||
for( ; i < n; i++ ) |
||||
result += tab[a[i]]; |
||||
return result; |
||||
} |
||||
|
||||
int normHamming(const uchar* a, const uchar* b, int n, int cellSize) |
||||
{ |
||||
if( cellSize == 1 ) |
||||
return normHamming(a, b, n); |
||||
const uchar* tab = 0; |
||||
if( cellSize == 2 ) |
||||
tab = popCountTable2; |
||||
else if( cellSize == 4 ) |
||||
tab = popCountTable4; |
||||
else |
||||
return -1; |
||||
int i = 0; |
||||
int result = 0; |
||||
#if CV_ENABLE_UNROLLED |
||||
for( ; i <= n - 4; i += 4 ) |
||||
result += tab[a[i] ^ b[i]] + tab[a[i+1] ^ b[i+1]] + |
||||
tab[a[i+2] ^ b[i+2]] + tab[a[i+3] ^ b[i+3]]; |
||||
#endif |
||||
for( ; i < n; i++ ) |
||||
result += tab[a[i] ^ b[i]]; |
||||
return result; |
||||
} |
||||
|
||||
float normL2Sqr_(const float* a, const float* b, int n) |
||||
{ |
||||
int j = 0; float d = 0.f; |
||||
#if CV_SSE |
||||
float CV_DECL_ALIGNED(16) buf[4]; |
||||
__m128 d0 = _mm_setzero_ps(), d1 = _mm_setzero_ps(); |
||||
|
||||
for( ; j <= n - 8; j += 8 ) |
||||
{ |
||||
__m128 t0 = _mm_sub_ps(_mm_loadu_ps(a + j), _mm_loadu_ps(b + j)); |
||||
__m128 t1 = _mm_sub_ps(_mm_loadu_ps(a + j + 4), _mm_loadu_ps(b + j + 4)); |
||||
d0 = _mm_add_ps(d0, _mm_mul_ps(t0, t0)); |
||||
d1 = _mm_add_ps(d1, _mm_mul_ps(t1, t1)); |
||||
} |
||||
_mm_store_ps(buf, _mm_add_ps(d0, d1)); |
||||
d = buf[0] + buf[1] + buf[2] + buf[3]; |
||||
#endif |
||||
{ |
||||
for( ; j <= n - 4; j += 4 ) |
||||
{ |
||||
float t0 = a[j] - b[j], t1 = a[j+1] - b[j+1], t2 = a[j+2] - b[j+2], t3 = a[j+3] - b[j+3]; |
||||
d += t0*t0 + t1*t1 + t2*t2 + t3*t3; |
||||
} |
||||
} |
||||
|
||||
for( ; j < n; j++ ) |
||||
{ |
||||
float t = a[j] - b[j]; |
||||
d += t*t; |
||||
} |
||||
return d; |
||||
} |
||||
|
||||
|
||||
float normL1_(const float* a, const float* b, int n) |
||||
{ |
||||
int j = 0; float d = 0.f; |
||||
#if CV_SSE |
||||
float CV_DECL_ALIGNED(16) buf[4]; |
||||
static const int CV_DECL_ALIGNED(16) absbuf[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff}; |
||||
__m128 d0 = _mm_setzero_ps(), d1 = _mm_setzero_ps(); |
||||
__m128 absmask = _mm_load_ps((const float*)absbuf); |
||||
|
||||
for( ; j <= n - 8; j += 8 ) |
||||
{ |
||||
__m128 t0 = _mm_sub_ps(_mm_loadu_ps(a + j), _mm_loadu_ps(b + j)); |
||||
__m128 t1 = _mm_sub_ps(_mm_loadu_ps(a + j + 4), _mm_loadu_ps(b + j + 4)); |
||||
d0 = _mm_add_ps(d0, _mm_and_ps(t0, absmask)); |
||||
d1 = _mm_add_ps(d1, _mm_and_ps(t1, absmask)); |
||||
} |
||||
_mm_store_ps(buf, _mm_add_ps(d0, d1)); |
||||
d = buf[0] + buf[1] + buf[2] + buf[3]; |
||||
#elif CV_NEON |
||||
float32x4_t v_sum = vdupq_n_f32(0.0f); |
||||
for ( ; j <= n - 4; j += 4) |
||||
v_sum = vaddq_f32(v_sum, vabdq_f32(vld1q_f32(a + j), vld1q_f32(b + j))); |
||||
|
||||
float CV_DECL_ALIGNED(16) buf[4]; |
||||
vst1q_f32(buf, v_sum); |
||||
d = buf[0] + buf[1] + buf[2] + buf[3]; |
||||
#endif |
||||
{ |
||||
for( ; j <= n - 4; j += 4 ) |
||||
{ |
||||
d += std::abs(a[j] - b[j]) + std::abs(a[j+1] - b[j+1]) + |
||||
std::abs(a[j+2] - b[j+2]) + std::abs(a[j+3] - b[j+3]); |
||||
} |
||||
} |
||||
|
||||
for( ; j < n; j++ ) |
||||
d += std::abs(a[j] - b[j]); |
||||
return d; |
||||
} |
||||
|
||||
int normL1_(const uchar* a, const uchar* b, int n) |
||||
{ |
||||
int j = 0, d = 0; |
||||
#if CV_SSE |
||||
__m128i d0 = _mm_setzero_si128(); |
||||
|
||||
for( ; j <= n - 16; j += 16 ) |
||||
{ |
||||
__m128i t0 = _mm_loadu_si128((const __m128i*)(a + j)); |
||||
__m128i t1 = _mm_loadu_si128((const __m128i*)(b + j)); |
||||
|
||||
d0 = _mm_add_epi32(d0, _mm_sad_epu8(t0, t1)); |
||||
} |
||||
|
||||
for( ; j <= n - 4; j += 4 ) |
||||
{ |
||||
__m128i t0 = _mm_cvtsi32_si128(*(const int*)(a + j)); |
||||
__m128i t1 = _mm_cvtsi32_si128(*(const int*)(b + j)); |
||||
|
||||
d0 = _mm_add_epi32(d0, _mm_sad_epu8(t0, t1)); |
||||
} |
||||
d = _mm_cvtsi128_si32(_mm_add_epi32(d0, _mm_unpackhi_epi64(d0, d0))); |
||||
#elif CV_NEON |
||||
uint32x4_t v_sum = vdupq_n_u32(0.0f); |
||||
for ( ; j <= n - 16; j += 16) |
||||
{ |
||||
uint8x16_t v_dst = vabdq_u8(vld1q_u8(a + j), vld1q_u8(b + j)); |
||||
uint16x8_t v_low = vmovl_u8(vget_low_u8(v_dst)), v_high = vmovl_u8(vget_high_u8(v_dst)); |
||||
v_sum = vaddq_u32(v_sum, vaddl_u16(vget_low_u16(v_low), vget_low_u16(v_high))); |
||||
v_sum = vaddq_u32(v_sum, vaddl_u16(vget_high_u16(v_low), vget_high_u16(v_high))); |
||||
} |
||||
|
||||
uint CV_DECL_ALIGNED(16) buf[4]; |
||||
vst1q_u32(buf, v_sum); |
||||
d = buf[0] + buf[1] + buf[2] + buf[3]; |
||||
#endif |
||||
{ |
||||
for( ; j <= n - 4; j += 4 ) |
||||
{ |
||||
d += std::abs(a[j] - b[j]) + std::abs(a[j+1] - b[j+1]) + |
||||
std::abs(a[j+2] - b[j+2]) + std::abs(a[j+3] - b[j+3]); |
||||
} |
||||
} |
||||
for( ; j < n; j++ ) |
||||
d += std::abs(a[j] - b[j]); |
||||
return d; |
||||
} |
||||
|
||||
}} //cv::hal
|
@ -0,0 +1,47 @@ |
||||
/*M///////////////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
|
||||
//
|
||||
// By downloading, copying, installing or using the software you agree to this license.
|
||||
// If you do not agree to this license, do not download, install,
|
||||
// copy or use the software.
|
||||
//
|
||||
//
|
||||
// License Agreement
|
||||
// For Open Source Computer Vision Library
|
||||
//
|
||||
// Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
|
||||
// Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
|
||||
// Third party copyrights are property of their respective owners.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without modification,
|
||||
// are permitted provided that the following conditions are met:
|
||||
//
|
||||
// * Redistribution's of source code must retain the above copyright notice,
|
||||
// this list of conditions and the following disclaimer.
|
||||
//
|
||||
// * Redistribution's in binary form must reproduce the above copyright notice,
|
||||
// this list of conditions and the following disclaimer in the documentation
|
||||
// and/or other materials provided with the distribution.
|
||||
//
|
||||
// * The name of the copyright holders may not be used to endorse or promote products
|
||||
// derived from this software without specific prior written permission.
|
||||
//
|
||||
// This software is provided by the copyright holders and contributors "as is" and
|
||||
// any express or implied warranties, including, but not limited to, the implied
|
||||
// warranties of merchantability and fitness for a particular purpose are disclaimed.
|
||||
// In no event shall the Intel Corporation or contributors be liable for any direct,
|
||||
// indirect, incidental, special, exemplary, or consequential damages
|
||||
// (including, but not limited to, procurement of substitute goods or services;
|
||||
// loss of use, data, or profits; or business interruption) however caused
|
||||
// and on any theory of liability, whether in contract, strict liability,
|
||||
// or tort (including negligence or otherwise) arising in any way out of
|
||||
// the use of this software, even if advised of the possibility of such damage.
|
||||
//
|
||||
//M*/
|
||||
|
||||
#include "precomp.hpp" |
||||
|
||||
namespace cv { namespace hal { |
||||
|
||||
}} |
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue