The Meson Build System
http://mesonbuild.com/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
191 lines
6.9 KiB
191 lines
6.9 KiB
--- |
|
short-description: CUDA module |
|
authors: |
|
- name: Olexa Bilaniuk |
|
years: [2019] |
|
has-copyright: false |
|
... |
|
|
|
# Unstable CUDA Module |
|
_Since: 0.50.0_ |
|
|
|
This module provides helper functionality related to the CUDA Toolkit and |
|
building code using it. |
|
|
|
|
|
**Note**: this module is unstable. It is only provided as a technology preview. |
|
Its API may change in arbitrary ways between releases or it might be removed |
|
from Meson altogether. |
|
|
|
|
|
## Importing the module |
|
|
|
The module may be imported as follows: |
|
|
|
``` meson |
|
cuda = import('unstable-cuda') |
|
``` |
|
|
|
It offers several useful functions that are enumerated below. |
|
|
|
|
|
## Functions |
|
|
|
### `nvcc_arch_flags()` |
|
_Since: 0.50.0_ |
|
|
|
``` meson |
|
cuda.nvcc_arch_flags(nvcc_or_version, ..., |
|
detected: string_or_array) |
|
``` |
|
|
|
Returns a list of `-gencode` flags that should be passed to `cuda_args:` in |
|
order to compile a "fat binary" for the architectures/compute capabilities |
|
enumerated in the positional argument(s). The flags shall be acceptable to |
|
the NVCC compiler object `nvcc_or_version`, or its version string. |
|
|
|
A set of architectures and/or compute capabilities may be specified by: |
|
|
|
- The single positional argument `'All'`, `'Common'` or `'Auto'` |
|
- As (an array of) |
|
- Architecture names (`'Kepler'`, `'Maxwell+Tegra'`, `'Turing'`) and/or |
|
- Compute capabilities (`'3.0'`, `'3.5'`, `'5.3'`, `'7.5'`) |
|
|
|
A suffix of `+PTX` requests PTX code generation for the given architecture. |
|
A compute capability given as `A.B(X.Y)` requests PTX generation for an older |
|
virtual architecture `X.Y` before binary generation for a newer architecture |
|
`A.B`. |
|
|
|
Multiple architectures and compute capabilities may be passed in using |
|
|
|
- Multiple positional arguments |
|
- Lists of strings |
|
- Space (` `), comma (`,`) or semicolon (`;`)-separated strings |
|
|
|
The single-word architectural sets `'All'`, `'Common'` or `'Auto'` cannot be |
|
mixed with architecture names or compute capabilities. Their interpretation is: |
|
|
|
| Name | Compute Capability | |
|
|-------------------|--------------------| |
|
| `'All'` | All CCs supported by given NVCC compiler. | |
|
| `'Common'` | Relatively common CCs supported by given NVCC compiler. Generally excludes Tegra and Tesla devices. | |
|
| `'Auto'` | The CCs provided by the `detected:` keyword, filtered for support by given NVCC compiler. | |
|
|
|
As a special case, when `nvcc_arch_flags()` is invoked with |
|
|
|
- an NVCC `compiler` object `nvcc`, |
|
- `'Auto'` mode and |
|
- no `detected:` keyword, |
|
|
|
Meson uses `nvcc`'s architecture auto-detection results. |
|
|
|
The supported architecture names and their corresponding compute capabilities |
|
are: |
|
|
|
| Name | Compute Capability | |
|
|-------------------|--------------------| |
|
| `'Fermi'` | 2.0, 2.1(2.0) | |
|
| `'Kepler'` | 3.0, 3.5 | |
|
| `'Kepler+Tegra'` | 3.2 | |
|
| `'Kepler+Tesla'` | 3.7 | |
|
| `'Maxwell'` | 5.0, 5.2 | |
|
| `'Maxwell+Tegra'` | 5.3 | |
|
| `'Pascal'` | 6.0, 6.1 | |
|
| `'Pascal+Tegra'` | 6.2 | |
|
| `'Volta'` | 7.0 | |
|
| `'Xavier'` | 7.2 | |
|
| `'Turing'` | 7.5 | |
|
|
|
|
|
Examples: |
|
|
|
cuda.nvcc_arch_flags('10.0', '3.0', '3.5', '5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', ['3.0', '3.5', '5.0+PTX']) |
|
cuda.nvcc_arch_flags('10.0', [['3.0', '3.5'], '5.0+PTX']) |
|
cuda.nvcc_arch_flags('10.0', '3.0 3.5 5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', '3.0,3.5,5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', '3.0;3.5;5.0+PTX') |
|
cuda.nvcc_arch_flags('10.0', 'Kepler 5.0+PTX') |
|
# Returns ['-gencode', 'arch=compute_30,code=sm_30', |
|
# '-gencode', 'arch=compute_35,code=sm_35', |
|
# '-gencode', 'arch=compute_50,code=sm_50', |
|
# '-gencode', 'arch=compute_50,code=compute_50'] |
|
|
|
cuda.nvcc_arch_flags('10.0', '3.5(3.0)') |
|
# Returns ['-gencode', 'arch=compute_30,code=sm_35'] |
|
|
|
cuda.nvcc_arch_flags('8.0', 'Common') |
|
# Returns ['-gencode', 'arch=compute_30,code=sm_30', |
|
# '-gencode', 'arch=compute_35,code=sm_35', |
|
# '-gencode', 'arch=compute_50,code=sm_50', |
|
# '-gencode', 'arch=compute_52,code=sm_52', |
|
# '-gencode', 'arch=compute_60,code=sm_60', |
|
# '-gencode', 'arch=compute_61,code=sm_61', |
|
# '-gencode', 'arch=compute_61,code=compute_61'] |
|
|
|
cuda.nvcc_arch_flags('9.2', 'Auto', detected: '6.0 6.0 6.0 6.0') |
|
cuda.nvcc_arch_flags('9.2', 'Auto', detected: ['6.0', '6.0', '6.0', '6.0']) |
|
# Returns ['-gencode', 'arch=compute_60,code=sm_60'] |
|
|
|
cuda.nvcc_arch_flags(nvcc, 'All') |
|
# Returns ['-gencode', 'arch=compute_20,code=sm_20', |
|
# '-gencode', 'arch=compute_20,code=sm_21', |
|
# '-gencode', 'arch=compute_30,code=sm_30', |
|
# '-gencode', 'arch=compute_32,code=sm_32', |
|
# '-gencode', 'arch=compute_35,code=sm_35', |
|
# '-gencode', 'arch=compute_37,code=sm_37', |
|
# '-gencode', 'arch=compute_50,code=sm_50', # nvcc.version() < 7.0 |
|
# '-gencode', 'arch=compute_52,code=sm_52', |
|
# '-gencode', 'arch=compute_53,code=sm_53', # nvcc.version() >= 7.0 |
|
# '-gencode', 'arch=compute_60,code=sm_60', |
|
# '-gencode', 'arch=compute_61,code=sm_61', # nvcc.version() >= 8.0 |
|
# '-gencode', 'arch=compute_70,code=sm_70', |
|
# '-gencode', 'arch=compute_72,code=sm_72', # nvcc.version() >= 9.0 |
|
# '-gencode', 'arch=compute_75,code=sm_75'] # nvcc.version() >= 10.0 |
|
|
|
_Note:_ This function is intended to closely replicate CMake's FindCUDA module |
|
function `CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])` |
|
|
|
|
|
|
|
### `nvcc_arch_readable()` |
|
_Since: 0.50.0_ |
|
|
|
``` meson |
|
cuda.nvcc_arch_readable(nvcc_or_version, ..., |
|
detected: string_or_array) |
|
``` |
|
|
|
Has precisely the same interface as [`nvcc_arch_flags()`](#nvcc_arch_flags), |
|
but rather than returning a list of flags, it returns a "readable" list of |
|
architectures that will be compiled for. The output of this function is solely |
|
intended for informative message printing. |
|
|
|
archs = '3.0 3.5 5.0+PTX' |
|
readable = cuda.nvcc_arch_readable(nvcc, archs) |
|
message('Building for architectures ' + ' '.join(readable)) |
|
|
|
This will print |
|
|
|
Message: Building for architectures sm30 sm35 sm50 compute50 |
|
|
|
_Note:_ This function is intended to closely replicate CMake's FindCUDA module function |
|
`CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])` |
|
|
|
|
|
|
|
### `min_driver_version()` |
|
_Since: 0.50.0_ |
|
|
|
``` meson |
|
cuda.min_driver_version(nvcc_or_version) |
|
``` |
|
|
|
Returns the minimum NVIDIA proprietary driver version required, on the host |
|
system, by kernels compiled with the given NVCC compiler or its version string. |
|
|
|
The output of this function is generally intended for informative message |
|
printing, but could be used for assertions or to conditionally enable |
|
features known to exist within the minimum NVIDIA driver required. |
|
|
|
|
|
|