|
|
|
---
|
|
|
|
short-description: CUDA module
|
|
|
|
authors:
|
|
|
|
- name: Olexa Bilaniuk
|
|
|
|
years: [2019]
|
|
|
|
has-copyright: false
|
|
|
|
...
|
|
|
|
|
|
|
|
# Unstable CUDA Module
|
|
|
|
_Since: 0.50.0_
|
|
|
|
|
|
|
|
This module provides helper functionality related to the CUDA Toolkit and
|
|
|
|
building code using it.
|
|
|
|
|
|
|
|
|
|
|
|
**Note**: this module is unstable. It is only provided as a technology preview.
|
|
|
|
Its API may change in arbitrary ways between releases or it might be removed
|
|
|
|
from Meson altogether.
|
|
|
|
|
|
|
|
|
|
|
|
## Importing the module
|
|
|
|
|
|
|
|
The module may be imported as follows:
|
|
|
|
|
|
|
|
``` meson
|
|
|
|
cuda = import('unstable-cuda')
|
|
|
|
```
|
|
|
|
|
|
|
|
It offers several useful functions that are enumerated below.
|
|
|
|
|
|
|
|
|
|
|
|
## Functions
|
|
|
|
|
|
|
|
### `nvcc_arch_flags()`
|
|
|
|
_Since: 0.50.0_
|
|
|
|
|
|
|
|
``` meson
|
|
|
|
cuda.nvcc_arch_flags(nvcc_or_version, ...,
|
|
|
|
detected: string_or_array)
|
|
|
|
```
|
|
|
|
|
|
|
|
Returns a list of `-gencode` flags that should be passed to `cuda_args:` in
|
|
|
|
order to compile a "fat binary" for the architectures/compute capabilities
|
|
|
|
enumerated in the positional argument(s). The flags shall be acceptable to
|
|
|
|
the NVCC compiler object `nvcc_or_version`, or its version string.
|
|
|
|
|
|
|
|
A set of architectures and/or compute capabilities may be specified by:
|
|
|
|
|
|
|
|
- The single positional argument `'All'`, `'Common'` or `'Auto'`
|
|
|
|
- As (an array of)
|
|
|
|
- Architecture names (`'Kepler'`, `'Maxwell+Tegra'`, `'Turing'`) and/or
|
|
|
|
- Compute capabilities (`'3.0'`, `'3.5'`, `'5.3'`, `'7.5'`)
|
|
|
|
|
|
|
|
A suffix of `+PTX` requests PTX code generation for the given architecture.
|
|
|
|
A compute capability given as `A.B(X.Y)` requests PTX generation for an older
|
|
|
|
virtual architecture `X.Y` before binary generation for a newer architecture
|
|
|
|
`A.B`.
|
|
|
|
|
|
|
|
Multiple architectures and compute capabilities may be passed in using
|
|
|
|
|
|
|
|
- Multiple positional arguments
|
|
|
|
- Lists of strings
|
|
|
|
- Space (` `), comma (`,`) or semicolon (`;`)-separated strings
|
|
|
|
|
|
|
|
The single-word architectural sets `'All'`, `'Common'` or `'Auto'` cannot be
|
|
|
|
mixed with architecture names or compute capabilities. Their interpretation is:
|
|
|
|
|
|
|
|
| Name | Compute Capability |
|
|
|
|
|-------------------|--------------------|
|
|
|
|
| `'All'` | All CCs supported by given NVCC compiler. |
|
|
|
|
| `'Common'` | Relatively common CCs supported by given NVCC compiler. Generally excludes Tegra and Tesla devices. |
|
|
|
|
| `'Auto'` | The CCs provided by the `detected:` keyword, filtered for support by given NVCC compiler. |
|
|
|
|
|
|
|
|
The supported architecture names and their corresponding compute capabilities
|
|
|
|
are:
|
|
|
|
|
|
|
|
| Name | Compute Capability |
|
|
|
|
|-------------------|--------------------|
|
|
|
|
| `'Fermi'` | 2.0, 2.1(2.0) |
|
|
|
|
| `'Kepler'` | 3.0, 3.5 |
|
|
|
|
| `'Kepler+Tegra'` | 3.2 |
|
|
|
|
| `'Kepler+Tesla'` | 3.7 |
|
|
|
|
| `'Maxwell'` | 5.0, 5.2 |
|
|
|
|
| `'Maxwell+Tegra'` | 5.3 |
|
|
|
|
| `'Pascal'` | 6.0, 6.1 |
|
|
|
|
| `'Pascal+Tegra'` | 6.2 |
|
|
|
|
| `'Volta'` | 7.0 |
|
|
|
|
| `'Volta+Tegra'` | 7.2 |
|
|
|
|
| `'Turing'` | 7.5 |
|
|
|
|
|
|
|
|
|
|
|
|
Examples:
|
|
|
|
|
|
|
|
cuda.nvcc_arch_flags('10.0', '3.0', '3.5', '5.0+PTX')
|
|
|
|
cuda.nvcc_arch_flags('10.0', ['3.0', '3.5', '5.0+PTX'])
|
|
|
|
cuda.nvcc_arch_flags('10.0', [['3.0', '3.5'], '5.0+PTX'])
|
|
|
|
cuda.nvcc_arch_flags('10.0', '3.0 3.5 5.0+PTX')
|
|
|
|
cuda.nvcc_arch_flags('10.0', '3.0,3.5,5.0+PTX')
|
|
|
|
cuda.nvcc_arch_flags('10.0', '3.0;3.5;5.0+PTX')
|
|
|
|
cuda.nvcc_arch_flags('10.0', 'Kepler 5.0+PTX')
|
|
|
|
# Returns ['-gencode', 'arch=compute_30,code=sm_30',
|
|
|
|
# '-gencode', 'arch=compute_35,code=sm_35',
|
|
|
|
# '-gencode', 'arch=compute_50,code=sm_50',
|
|
|
|
# '-gencode', 'arch=compute_50,code=compute_50']
|
|
|
|
|
|
|
|
cuda.nvcc_arch_flags('10.0', '3.5(3.0)')
|
|
|
|
# Returns ['-gencode', 'arch=compute_30,code=sm_35']
|
|
|
|
|
|
|
|
cuda.nvcc_arch_flags('8.0', 'Common')
|
|
|
|
# Returns ['-gencode', 'arch=compute_30,code=sm_30',
|
|
|
|
# '-gencode', 'arch=compute_35,code=sm_35',
|
|
|
|
# '-gencode', 'arch=compute_50,code=sm_50',
|
|
|
|
# '-gencode', 'arch=compute_52,code=sm_52',
|
|
|
|
# '-gencode', 'arch=compute_60,code=sm_60',
|
|
|
|
# '-gencode', 'arch=compute_61,code=sm_61',
|
|
|
|
# '-gencode', 'arch=compute_61,code=compute_61']
|
|
|
|
|
|
|
|
cuda.nvcc_arch_flags('9.2', 'Auto', detected: '6.0 6.0 6.0 6.0')
|
|
|
|
cuda.nvcc_arch_flags('9.2', 'Auto', detected: ['6.0', '6.0', '6.0', '6.0'])
|
|
|
|
# Returns ['-gencode', 'arch=compute_60,code=sm_60']
|
|
|
|
|
|
|
|
cuda.nvcc_arch_flags(nvcc, 'All')
|
|
|
|
# Returns ['-gencode', 'arch=compute_20,code=sm_20',
|
|
|
|
# '-gencode', 'arch=compute_20,code=sm_21',
|
|
|
|
# '-gencode', 'arch=compute_30,code=sm_30',
|
|
|
|
# '-gencode', 'arch=compute_32,code=sm_32',
|
|
|
|
# '-gencode', 'arch=compute_35,code=sm_35',
|
|
|
|
# '-gencode', 'arch=compute_37,code=sm_37',
|
|
|
|
# '-gencode', 'arch=compute_50,code=sm_50', # nvcc.version() < 7.0
|
|
|
|
# '-gencode', 'arch=compute_52,code=sm_52',
|
|
|
|
# '-gencode', 'arch=compute_53,code=sm_53', # nvcc.version() >= 7.0
|
|
|
|
# '-gencode', 'arch=compute_60,code=sm_60',
|
|
|
|
# '-gencode', 'arch=compute_61,code=sm_61', # nvcc.version() >= 8.0
|
|
|
|
# '-gencode', 'arch=compute_70,code=sm_70',
|
|
|
|
# '-gencode', 'arch=compute_72,code=sm_72', # nvcc.version() >= 9.0
|
|
|
|
# '-gencode', 'arch=compute_75,code=sm_75'] # nvcc.version() >= 10.0
|
|
|
|
|
|
|
|
_Note:_ This function is intended to closely replicate CMake's FindCUDA module
|
|
|
|
function `CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### `nvcc_arch_readable()`
|
|
|
|
_Since: 0.50.0_
|
|
|
|
|
|
|
|
``` meson
|
|
|
|
cuda.nvcc_arch_readable(nvcc_or_version, ...,
|
|
|
|
detected: string_or_array)
|
|
|
|
```
|
|
|
|
|
|
|
|
Has precisely the same interface as [`nvcc_arch_flags()`](#nvcc_arch_flags),
|
|
|
|
but rather than returning a list of flags, it returns a "readable" list of
|
|
|
|
architectures that will be compiled for. The output of this function is solely
|
|
|
|
intended for informative message printing.
|
|
|
|
|
|
|
|
archs = '3.0 3.5 5.0+PTX'
|
|
|
|
readable = cuda.nvcc_arch_readable(nvcc, archs)
|
|
|
|
message('Building for architectures ' + ' '.join(readable))
|
|
|
|
|
|
|
|
This will print
|
|
|
|
|
|
|
|
Message: Building for architectures sm30 sm35 sm50 compute50
|
|
|
|
|
|
|
|
_Note:_ This function is intended to closely replicate CMake's FindCUDA module function
|
|
|
|
`CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### `min_driver_version()`
|
|
|
|
_Since: 0.50.0_
|
|
|
|
|
|
|
|
``` meson
|
|
|
|
cuda.min_driver_version(nvcc_or_version)
|
|
|
|
```
|
|
|
|
|
|
|
|
Returns the minimum NVIDIA proprietary driver version required, on the host
|
|
|
|
system, by kernels compiled with the given NVCC compiler or its version string.
|
|
|
|
|
|
|
|
The output of this function is generally intended for informative message
|
|
|
|
printing, but could be used for assertions or to conditionally enable
|
|
|
|
features known to exist within the minimum NVIDIA driver required.
|
|
|
|
|
|
|
|
|