2.4 KiB
Unstable SIMD module
This module provides helper functionality to build code with SIMD instructions. Available since 0.42.0.
Note: this module is unstable. It is only provided as a technology preview. Its API may change in arbitrary ways between releases or it might be removed from Meson altogether.
Usage
This module is designed for the use case where you have an algorithm with one or more SIMD implementation and you choose which one to use at runtime.
The module provides one method, check
, which is used like this:
rval = simd.check('mysimds',
mmx : 'simd_mmx.c',
sse : 'simd_sse.c',
sse2 : 'simd_sse2.c',
sse3 : 'simd_sse3.c',
ssse3 : 'simd_ssse3.c',
sse41 : 'simd_sse41.c',
sse42 : 'simd_sse42.c',
avx : 'simd_avx.c',
avx2 : 'simd_avx2.c',
neon : 'simd_neon.c',
compiler : cc)
Here the individual files contain the accelerated versions of the
functions in question. The compiler
keyword argument takes the
compiler you are going to use to compile them. The function returns an
array with two values. The first value is a bunch of libraries that
contain the compiled code. Any SIMD code that the compiler can't
compile (for example, Neon instructions on an x86 machine) are
ignored. You should pass this value to the desired target using
link_with
. The second value is a configuration_data
object that
contains true for all the values that were supported. For example if
the compiler did support sse2 instructions, then the object would have
HAVE_SSE2
set to 1.
Generating code to detect the proper instruction set at runtime is straightforward. First you create a header with the configuration object and then a chooser function that looks like this:
void (*fptr)(type_of_function_here) = NULL;
#if HAVE_NEON
if(fptr == NULL && neon_available()) {
fptr = neon_accelerated_function;
}
#endif
#if HAVE_AVX2
if(fptr == NULL && avx2_available()) {
fptr = avx_accelerated_function;
}
#endif
...
if(fptr == NULL) {
fptr = default_function;
}
Each source file provides two functions, the xxx_available
function
to query whether the CPU currently in use supports the instruction set
and xxx_accelerated_function
that is the corresponding accelerated
implementation.
At the end of this function the function pointer points to the fastest available implementation and can be invoked to do the computation.