The Meson Build System
http://mesonbuild.com/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
186 lines
6.7 KiB
186 lines
6.7 KiB
--- |
|
short-description: CUDA module |
|
authors: |
|
- name: Olexa Bilaniuk |
|
years: [2019] |
|
has-copyright: false |
|
... |
|
|
|
# Unstable CUDA Module |
|
_Since: 0.50.0_ |
|
|
|
This module provides helper functionality related to the CUDA Toolkit and |
|
building code using it. |
|
|
|
|
|
**Note**: this module is unstable. It is only provided as a technology preview. |
|
Its API may change in arbitrary ways between releases or it might be removed |
|
from Meson altogether. |
|
|
|
|
|
## Importing the module |
|
|
|
The module may be imported as follows: |
|
|
|
``` meson |
|
cuda = import('unstable-cuda') |
|
``` |
|
|
|
It offers several useful functions that are enumerated below. |
|
|
|
|
|
## Functions |
|
|
|
### `nvcc_arch_flags()` |
|
_Since: 0.50.0_ |
|
|
|
``` meson |
|
cuda.nvcc_arch_flags(cuda_version_string, ..., |
|
detected: string_or_array) |
|
``` |
|
|
|
Returns a list of `-gencode` flags that should be passed to `cuda_args:` in |
|
order to compile a "fat binary" for the architectures/compute capabilities |
|
enumerated in the positional argument(s). The flags shall be acceptable to |
|
an NVCC with CUDA Toolkit version string `cuda_version_string`. |
|
|
|
A set of architectures and/or compute capabilities may be specified by: |
|
|
|
- The single positional argument `'All'`, `'Common'` or `'Auto'` |
|
- As (an array of) |
|
- Architecture names (`'Kepler'`, `'Maxwell+Tegra'`, `'Turing'`) and/or |
|
- Compute capabilities (`'3.0'`, `'3.5'`, `'5.3'`, `'7.5'`) |
|
|
|
A suffix of `+PTX` requests PTX code generation for the given architecture. |
|
A compute capability given as `A.B(X.Y)` requests PTX generation for an older |
|
virtual architecture `X.Y` before binary generation for a newer architecture |
|
`A.B`. |
|
|
|
Multiple architectures and compute capabilities may be passed in using |
|
|
|
- Multiple positional arguments |
|
- Lists of strings |
|
- Space (` `), comma (`,`) or semicolon (`;`)-separated strings |
|
|
|
The single-word architectural sets `'All'`, `'Common'` or `'Auto'` |
|
cannot be mixed with architecture names or compute capabilities. Their |
|
interpretation is: |
|
|
|
| Name | Compute Capability | |
|
|-------------------|--------------------| |
|
| `'All'` | All CCs supported by given NVCC compiler. | |
|
| `'Common'` | Relatively common CCs supported by given NVCC compiler. Generally excludes Tegra and Tesla devices. | |
|
| `'Auto'` | The CCs provided by the `detected:` keyword, filtered for support by given NVCC compiler. | |
|
|
|
The supported architecture names and their corresponding compute capabilities |
|
are: |
|
|
|
| Name | Compute Capability | |
|
|-------------------|--------------------| |
|
| `'Fermi'` | 2.0, 2.1(2.0) | |
|
| `'Kepler'` | 3.0, 3.5 | |
|
| `'Kepler+Tegra'` | 3.2 | |
|
| `'Kepler+Tesla'` | 3.7 | |
|
| `'Maxwell'` | 5.0, 5.2 | |
|
| `'Maxwell+Tegra'` | 5.3 | |
|
| `'Pascal'` | 6.0, 6.1 | |
|
| `'Pascal+Tegra'` | 6.2 | |
|
| `'Volta'` | 7.0 | |
|
| `'Xavier'` | 7.2 | |
|
| `'Turing'` | 7.5 | |
|
| `'Ampere'` | 8.0, 8.6 | |
|
|
|
|
|
Examples: |
|
|
|
cuda.nvcc_arch_flags('10.0', '3.0', '3.5', '5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', ['3.0', '3.5', '5.0+PTX']) |
|
cuda.nvcc_arch_flags('10.0', [['3.0', '3.5'], '5.0+PTX']) |
|
cuda.nvcc_arch_flags('10.0', '3.0 3.5 5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', '3.0,3.5,5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', '3.0;3.5;5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', 'Kepler 5.0+PTX') |
|
# Returns ['-gencode', 'arch=compute_30,code=sm_30', |
|
# '-gencode', 'arch=compute_35,code=sm_35', |
|
# '-gencode', 'arch=compute_50,code=sm_50', |
|
# '-gencode', 'arch=compute_50,code=compute_50'] |
|
|
|
cuda.nvcc_arch_flags('10.0', '3.5(3.0)') |
|
# Returns ['-gencode', 'arch=compute_30,code=sm_35'] |
|
|
|
cuda.nvcc_arch_flags('8.0', 'Common') |
|
# Returns ['-gencode', 'arch=compute_30,code=sm_30', |
|
# '-gencode', 'arch=compute_35,code=sm_35', |
|
# '-gencode', 'arch=compute_50,code=sm_50', |
|
# '-gencode', 'arch=compute_52,code=sm_52', |
|
# '-gencode', 'arch=compute_60,code=sm_60', |
|
# '-gencode', 'arch=compute_61,code=sm_61', |
|
# '-gencode', 'arch=compute_61,code=compute_61'] |
|
|
|
cuda.nvcc_arch_flags('9.2', 'Auto', detected: '6.0 6.0 6.0 6.0') |
|
cuda.nvcc_arch_flags('9.2', 'Auto', detected: ['6.0', '6.0', '6.0', '6.0']) |
|
# Returns ['-gencode', 'arch=compute_60,code=sm_60'] |
|
|
|
cuda.nvcc_arch_flags(nvcc, 'All') |
|
# Returns ['-gencode', 'arch=compute_20,code=sm_20', |
|
# '-gencode', 'arch=compute_20,code=sm_21', |
|
# '-gencode', 'arch=compute_30,code=sm_30', |
|
# '-gencode', 'arch=compute_32,code=sm_32', |
|
# '-gencode', 'arch=compute_35,code=sm_35', |
|
# '-gencode', 'arch=compute_37,code=sm_37', |
|
# '-gencode', 'arch=compute_50,code=sm_50', # nvcc.version() < 7.0 |
|
# '-gencode', 'arch=compute_52,code=sm_52', |
|
# '-gencode', 'arch=compute_53,code=sm_53', # nvcc.version() >= 7.0 |
|
# '-gencode', 'arch=compute_60,code=sm_60', |
|
# '-gencode', 'arch=compute_61,code=sm_61', # nvcc.version() >= 8.0 |
|
# '-gencode', 'arch=compute_70,code=sm_70', |
|
# '-gencode', 'arch=compute_72,code=sm_72', # nvcc.version() >= 9.0 |
|
# '-gencode', 'arch=compute_75,code=sm_75'] # nvcc.version() >= 10.0 |
|
|
|
_Note:_ This function is intended to closely replicate CMake's FindCUDA module |
|
function `CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])` |
|
|
|
|
|
|
|
### `nvcc_arch_readable()` |
|
_Since: 0.50.0_ |
|
|
|
``` meson |
|
cuda.nvcc_arch_readable(cuda_version_string, ..., |
|
detected: string_or_array) |
|
``` |
|
|
|
Has precisely the same interface as [`nvcc_arch_flags()`](#nvcc_arch_flags), |
|
but rather than returning a list of flags, it returns a "readable" list of |
|
architectures that will be compiled for. The output of this function is solely |
|
intended for informative message printing. |
|
|
|
archs = '3.0 3.5 5.0+PTX' |
|
readable = cuda.nvcc_arch_readable('10.0', archs) |
|
message('Building for architectures ' + ' '.join(readable)) |
|
|
|
This will print |
|
|
|
Message: Building for architectures sm30 sm35 sm50 compute50 |
|
|
|
_Note:_ This function is intended to closely replicate CMake's |
|
FindCUDA module function `CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, |
|
[list of CUDA compute architectures])` |
|
|
|
|
|
|
|
### `min_driver_version()` |
|
_Since: 0.50.0_ |
|
|
|
``` meson |
|
cuda.min_driver_version(cuda_version_string) |
|
``` |
|
|
|
Returns the minimum NVIDIA proprietary driver version required, on the |
|
host system, by kernels compiled with a CUDA Toolkit with the given |
|
version string. |
|
|
|
The output of this function is generally intended for informative |
|
message printing, but could be used for assertions or to conditionally |
|
enable features known to exist within the minimum NVIDIA driver |
|
required.
|
|
|