Merge pull request #4835 from obilaniu/cudaimprovements

CUDA support improvements
pull/4883/head
Jussi Pakkanen 6 years ago committed by GitHub
commit e26b5a119e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 183
      docs/markdown/Cuda-module.md
  2. 1
      docs/sitemap.txt
  3. 7
      mesonbuild/build.py
  4. 17
      mesonbuild/environment.py
  5. 259
      mesonbuild/modules/unstable_cuda.py
  6. 16
      test cases/cuda/3 cudamodule/meson.build
  7. 30
      test cases/cuda/3 cudamodule/prog.cu

@ -0,0 +1,183 @@
---
short-description: CUDA module
authors:
- name: Olexa Bilaniuk
years: [2019]
has-copyright: false
...
# Unstable CUDA Module (`unstable-cuda`)
_Since: 0.50.0_
This module provides helper functionality related to the CUDA Toolkit and
building code using it.
**Note**: this module is unstable. It is only provided as a technology preview.
Its API may change in arbitrary ways between releases or it might be removed
from Meson altogether.
## Importing the module
The module may be imported as follows:
``` meson
cuda = import('unstable-cuda')
```
It offers several useful functions that are enumerated below.
## Functions
### `nvcc_arch_flags()`
_Since: 0.50.0_
``` meson
cuda.nvcc_arch_flags(nvcc_or_version, ...,
detected: string_or_array)
```
Returns a list of `-gencode` flags that should be passed to `cuda_args:` in
order to compile a "fat binary" for the architectures/compute capabilities
enumerated in the positional argument(s). The flags shall be acceptable to
the NVCC compiler object `nvcc_or_version`, or its version string.
A set of architectures and/or compute capabilities may be specified by:
- The single positional argument `'All'`, `'Common'` or `'Auto'`
- As (an array of)
- Architecture names (`'Kepler'`, `'Maxwell+Tegra'`, `'Turing'`) and/or
- Compute capabilities (`'3.0'`, `'3.5'`, `'5.3'`, `'7.5'`)
A suffix of `+PTX` requests PTX code generation for the given architecture.
A compute capability given as `A.B(X.Y)` requests PTX generation for an older
virtual architecture `X.Y` before binary generation for a newer architecture
`A.B`.
Multiple architectures and compute capabilities may be passed in using
- Multiple positional arguments
- Lists of strings
- Space (` `), comma (`,`) or semicolon (`;`)-separated strings
The single-word architectural sets `'All'`, `'Common'` or `'Auto'` cannot be
mixed with architecture names or compute capabilities. Their interpretation is:
| Name | Compute Capability |
|-------------------|--------------------|
| `'All'` | All CCs supported by given NVCC compiler. |
| `'Common'` | Relatively common CCs supported by given NVCC compiler. Generally excludes Tegra and Tesla devices. |
| `'Auto'` | The CCs provided by the `detected:` keyword, filtered for support by given NVCC compiler. |
The supported architecture names and their corresponding compute capabilities
are:
| Name | Compute Capability |
|-------------------|--------------------|
| `'Fermi'` | 2.0, 2.1(2.0) |
| `'Kepler'` | 3.0, 3.5 |
| `'Kepler+Tegra'` | 3.2 |
| `'Kepler+Tesla'` | 3.7 |
| `'Maxwell'` | 5.0, 5.2 |
| `'Maxwell+Tegra'` | 5.3 |
| `'Pascal'` | 6.0, 6.1 |
| `'Pascal+Tegra'` | 6.2 |
| `'Volta'` | 7.0 |
| `'Volta+Tegra'` | 7.2 |
| `'Turing'` | 7.5 |
Examples:
cuda.nvcc_arch_flags('10.0', '3.0', '3.5', '5.0+PTX')
cuda.nvcc_arch_flags('10.0', ['3.0', '3.5', '5.0+PTX'])
cuda.nvcc_arch_flags('10.0', [['3.0', '3.5'], '5.0+PTX'])
cuda.nvcc_arch_flags('10.0', '3.0 3.5 5.0+PTX')
cuda.nvcc_arch_flags('10.0', '3.0,3.5,5.0+PTX')
cuda.nvcc_arch_flags('10.0', '3.0;3.5;5.0+PTX')
cuda.nvcc_arch_flags('10.0', 'Kepler 5.0+PTX')
# Returns ['-gencode', 'arch=compute_30,code=sm_30',
# '-gencode', 'arch=compute_35,code=sm_35',
# '-gencode', 'arch=compute_50,code=sm_50',
# '-gencode', 'arch=compute_50,code=compute_50']
cuda.nvcc_arch_flags('10.0', '3.5(3.0)')
# Returns ['-gencode', 'arch=compute_30,code=sm_35']
cuda.nvcc_arch_flags('8.0', 'Common')
# Returns ['-gencode', 'arch=compute_30,code=sm_30',
# '-gencode', 'arch=compute_35,code=sm_35',
# '-gencode', 'arch=compute_50,code=sm_50',
# '-gencode', 'arch=compute_52,code=sm_52',
# '-gencode', 'arch=compute_60,code=sm_60',
# '-gencode', 'arch=compute_61,code=sm_61',
# '-gencode', 'arch=compute_61,code=compute_61']
cuda.nvcc_arch_flags('9.2', 'Auto', detected: '6.0 6.0 6.0 6.0')
cuda.nvcc_arch_flags('9.2', 'Auto', detected: ['6.0', '6.0', '6.0', '6.0'])
# Returns ['-gencode', 'arch=compute_60,code=sm_60']
cuda.nvcc_arch_flags(nvcc, 'All')
# Returns ['-gencode', 'arch=compute_20,code=sm_20',
# '-gencode', 'arch=compute_20,code=sm_21',
# '-gencode', 'arch=compute_30,code=sm_30',
# '-gencode', 'arch=compute_32,code=sm_32',
# '-gencode', 'arch=compute_35,code=sm_35',
# '-gencode', 'arch=compute_37,code=sm_37',
# '-gencode', 'arch=compute_50,code=sm_50', # nvcc.version() < 7.0
# '-gencode', 'arch=compute_52,code=sm_52',
# '-gencode', 'arch=compute_53,code=sm_53', # nvcc.version() >= 7.0
# '-gencode', 'arch=compute_60,code=sm_60',
# '-gencode', 'arch=compute_61,code=sm_61', # nvcc.version() >= 8.0
# '-gencode', 'arch=compute_70,code=sm_70',
# '-gencode', 'arch=compute_72,code=sm_72', # nvcc.version() >= 9.0
# '-gencode', 'arch=compute_75,code=sm_75'] # nvcc.version() >= 10.0
_Note:_ This function is intended to closely replicate CMake's FindCUDA module
function `CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])`
### `nvcc_arch_readable()`
_Since: 0.50.0_
``` meson
cuda.nvcc_arch_readable(nvcc_or_version, ...,
detected: string_or_array)
```
Has precisely the same interface as [`nvcc_arch_flags()`](#nvcc_arch_flags),
but rather than returning a list of flags, it returns a "readable" list of
architectures that will be compiled for. The output of this function is solely
intended for informative message printing.
archs = '3.0 3.5 5.0+PTX'
readable = cuda.nvcc_arch_readable(nvcc, archs)
message('Building for architectures ' + ' '.join(readable))
This will print
Message: Building for architectures sm30 sm35 sm50 compute50
_Note:_ This function is intended to closely replicate CMake's FindCUDA module function
`CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])`
### `min_driver_version()`
_Since: 0.50.0_
``` meson
cuda.min_driver_version(nvcc_or_version)
```
Returns the minimum NVIDIA proprietary driver version required, on the host
system, by kernels compiled with the given NVCC compiler or its version string.
The output of this function is generally intended for informative message
printing, but could be used for assertions or to conditionally enable
features known to exist within the minimum NVIDIA driver required.

@ -44,6 +44,7 @@ index.md
RPM-module.md
Simd-module.md
Windows-module.md
Cuda-module.md
Java.md
Vala.md
D.md

@ -36,6 +36,7 @@ pch_kwargs = set(['c_pch', 'cpp_pch'])
lang_arg_kwargs = set([
'c_args',
'cpp_args',
'cuda_args',
'd_args',
'd_import_dirs',
'd_unittest',
@ -797,13 +798,13 @@ just like those detected with the dependency() function.''')
for linktarget in lwhole:
self.link_whole(linktarget)
c_pchlist, cpp_pchlist, clist, cpplist, cslist, valalist, objclist, objcpplist, fortranlist, rustlist \
= extract_as_list(kwargs, 'c_pch', 'cpp_pch', 'c_args', 'cpp_args', 'cs_args', 'vala_args', 'objc_args',
c_pchlist, cpp_pchlist, clist, cpplist, cudalist, cslist, valalist, objclist, objcpplist, fortranlist, rustlist \
= extract_as_list(kwargs, 'c_pch', 'cpp_pch', 'c_args', 'cpp_args', 'cuda_args', 'cs_args', 'vala_args', 'objc_args',
'objcpp_args', 'fortran_args', 'rust_args')
self.add_pch('c', c_pchlist)
self.add_pch('cpp', cpp_pchlist)
compiler_args = {'c': clist, 'cpp': cpplist, 'cs': cslist, 'vala': valalist, 'objc': objclist, 'objcpp': objcpplist,
compiler_args = {'c': clist, 'cpp': cpplist, 'cuda': cudalist, 'cs': cslist, 'vala': valalist, 'objc': objclist, 'objcpp': objcpplist,
'fortran': fortranlist, 'rust': rustlist
}
for key, value in compiler_args.items():

@ -766,7 +766,22 @@ class Environment:
except OSError as e:
popen_exceptions[' '.join(compiler + [arg])] = e
continue
version = search_version(out)
# Example nvcc printout:
#
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2018 NVIDIA Corporation
# Built on Sat_Aug_25_21:08:01_CDT_2018
# Cuda compilation tools, release 10.0, V10.0.130
#
# search_version() first finds the "10.0" after "release",
# rather than the more precise "10.0.130" after "V".
# The patch version number is occasionally important; For
# instance, on Linux,
# - CUDA Toolkit 8.0.44 requires NVIDIA Driver 367.48
# - CUDA Toolkit 8.0.61 requires NVIDIA Driver 375.26
# Luckily, the "V" also makes it very simple to extract
# the full version:
version = out.strip().split('V')[-1]
cls = CudaCompiler
return cls(ccache + compiler, version, is_cross, exe_wrap)
raise EnvironmentException('Could not find suitable CUDA compiler: "' + ' '.join(compilers) + '"')

@ -0,0 +1,259 @@
# Copyright 2017 The Meson development team
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import re
from ..mesonlib import version_compare
from ..interpreter import CompilerHolder
from ..compilers import CudaCompiler
from . import ExtensionModule, ModuleReturnValue
from ..interpreterbase import (
flatten, permittedKwargs, noKwargs,
InvalidArguments, FeatureNew
)
class CudaModule(ExtensionModule):
@FeatureNew('CUDA module', '0.50.0')
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
@noKwargs
def min_driver_version(self, state, args, kwargs):
argerror = InvalidArguments('min_driver_version must have exactly one positional argument: ' +
'an NVCC compiler object, or its version string.')
if len(args) != 1:
raise argerror
else:
cuda_version = self._version_from_compiler(args[0])
if cuda_version == 'unknown':
raise argerror
driver_version_table = [
{'cuda_version': '>=10.0.130', 'windows': '411.31', 'linux': '410.48'},
{'cuda_version': '>=9.2.148', 'windows': '398.26', 'linux': '396.37'},
{'cuda_version': '>=9.2.88', 'windows': '397.44', 'linux': '396.26'},
{'cuda_version': '>=9.1.85', 'windows': '391.29', 'linux': '390.46'},
{'cuda_version': '>=9.0.76', 'windows': '385.54', 'linux': '384.81'},
{'cuda_version': '>=8.0.61', 'windows': '376.51', 'linux': '375.26'},
{'cuda_version': '>=8.0.44', 'windows': '369.30', 'linux': '367.48'},
{'cuda_version': '>=7.5.16', 'windows': '353.66', 'linux': '352.31'},
{'cuda_version': '>=7.0.28', 'windows': '347.62', 'linux': '346.46'},
]
driver_version = 'unknown'
for d in driver_version_table:
if version_compare(cuda_version, d['cuda_version']):
driver_version = d.get(state.host_machine.system, d['linux'])
break
return ModuleReturnValue(driver_version, [driver_version])
@permittedKwargs(['detected'])
def nvcc_arch_flags(self, state, args, kwargs):
nvcc_arch_args = self._validate_nvcc_arch_args(state, args, kwargs)
ret = self._nvcc_arch_flags(*nvcc_arch_args)[0]
return ModuleReturnValue(ret, [ret])
@permittedKwargs(['detected'])
def nvcc_arch_readable(self, state, args, kwargs):
nvcc_arch_args = self._validate_nvcc_arch_args(state, args, kwargs)
ret = self._nvcc_arch_flags(*nvcc_arch_args)[1]
return ModuleReturnValue(ret, [ret])
@staticmethod
def _break_arch_string(s):
s = re.sub('[ \t,;]+', ';', s)
s = s.strip(';').split(';')
return s
@staticmethod
def _version_from_compiler(c):
if isinstance(c, CompilerHolder):
c = c.compiler
if isinstance(c, CudaCompiler):
return c.version
if isinstance(c, str):
return c
return 'unknown'
def _validate_nvcc_arch_args(self, state, args, kwargs):
argerror = InvalidArguments('The first argument must be an NVCC compiler object, or its version string!')
if len(args) < 1:
raise argerror
else:
cuda_version = self._version_from_compiler(args[0])
if cuda_version == 'unknown':
raise argerror
arch_list = [] if len(args) <= 1 else flatten(args[1:])
arch_list = [self._break_arch_string(a) for a in arch_list]
arch_list = flatten(arch_list)
if len(arch_list) > 1 and not set(arch_list).isdisjoint({'All', 'Common', 'Auto'}):
raise InvalidArguments('''The special architectures 'All', 'Common' and 'Auto' must appear alone, as a positional argument!''')
arch_list = arch_list[0] if len(arch_list) == 1 else arch_list
detected = flatten([kwargs.get('detected', [])])
detected = [self._break_arch_string(a) for a in detected]
detected = flatten(detected)
if not set(detected).isdisjoint({'All', 'Common', 'Auto'}):
raise InvalidArguments('''The special architectures 'All', 'Common' and 'Auto' must appear alone, as a positional argument!''')
return cuda_version, arch_list, detected
def _nvcc_arch_flags(self, cuda_version, cuda_arch_list='Auto', detected=''):
"""
Using the CUDA Toolkit version (the NVCC version) and the target
architectures, compute the NVCC architecture flags.
"""
cuda_known_gpu_architectures = ['Fermi', 'Kepler', 'Maxwell'] # noqa: E221
cuda_common_gpu_architectures = ['3.0', '3.5', '5.0'] # noqa: E221
cuda_limit_gpu_architecture = None # noqa: E221
cuda_all_gpu_architectures = ['3.0', '3.2', '3.5', '5.0'] # noqa: E221
if version_compare(cuda_version, '<7.0'):
cuda_limit_gpu_architecture = '5.2'
if version_compare(cuda_version, '>=7.0'):
cuda_known_gpu_architectures += ['Kepler+Tegra', 'Kepler+Tesla', 'Maxwell+Tegra'] # noqa: E221
cuda_common_gpu_architectures += ['5.2'] # noqa: E221
if version_compare(cuda_version, '<8.0'):
cuda_common_gpu_architectures += ['5.2+PTX'] # noqa: E221
cuda_limit_gpu_architecture = '6.0' # noqa: E221
if version_compare(cuda_version, '>=8.0'):
cuda_known_gpu_architectures += ['Pascal', 'Pascal+Tegra'] # noqa: E221
cuda_common_gpu_architectures += ['6.0', '6.1'] # noqa: E221
cuda_all_gpu_architectures += ['6.0', '6.1', '6.2'] # noqa: E221
if version_compare(cuda_version, '<9.0'):
cuda_common_gpu_architectures += ['6.1+PTX'] # noqa: E221
cuda_limit_gpu_architecture = '7.0' # noqa: E221
if version_compare(cuda_version, '>=9.0'):
cuda_known_gpu_architectures += ['Volta', 'Volta+Tegra'] # noqa: E221
cuda_common_gpu_architectures += ['7.0', '7.0+PTX'] # noqa: E221
cuda_all_gpu_architectures += ['7.0', '7.0+PTX', '7.2', '7.2+PTX'] # noqa: E221
if version_compare(cuda_version, '<10.0'):
cuda_limit_gpu_architecture = '7.5'
if version_compare(cuda_version, '>=10.0'):
cuda_known_gpu_architectures += ['Turing'] # noqa: E221
cuda_common_gpu_architectures += ['7.5', '7.5+PTX'] # noqa: E221
cuda_all_gpu_architectures += ['7.5', '7.5+PTX'] # noqa: E221
if version_compare(cuda_version, '<11.0'):
cuda_limit_gpu_architecture = '8.0'
if not cuda_arch_list:
cuda_arch_list = 'Auto'
if cuda_arch_list == 'All': # noqa: E271
cuda_arch_list = cuda_known_gpu_architectures
elif cuda_arch_list == 'Common': # noqa: E271
cuda_arch_list = cuda_common_gpu_architectures
elif cuda_arch_list == 'Auto': # noqa: E271
if detected:
if isinstance(detected, list):
cuda_arch_list = detected
else:
cuda_arch_list = self._break_arch_string(detected)
if cuda_limit_gpu_architecture:
filtered_cuda_arch_list = []
for arch in cuda_arch_list:
if arch:
if version_compare(arch, '>=' + cuda_limit_gpu_architecture):
arch = cuda_common_gpu_architectures[-1]
if arch not in filtered_cuda_arch_list:
filtered_cuda_arch_list.append(arch)
cuda_arch_list = filtered_cuda_arch_list
else:
cuda_arch_list = cuda_common_gpu_architectures
elif isinstance(cuda_arch_list, str):
cuda_arch_list = self._break_arch_string(cuda_arch_list)
cuda_arch_list = sorted([x for x in set(cuda_arch_list) if x])
cuda_arch_bin = []
cuda_arch_ptx = []
for arch_name in cuda_arch_list:
arch_bin = []
arch_ptx = []
add_ptx = arch_name.endswith('+PTX')
if add_ptx:
arch_name = arch_name[:-len('+PTX')]
if re.fullmatch('[0-9]+\\.[0-9](\\([0-9]+\\.[0-9]\\))?', arch_name):
arch_bin, arch_ptx = [arch_name], [arch_name]
else:
arch_bin, arch_ptx = {
'Fermi': (['2.0', '2.1(2.0)'], []),
'Kepler+Tegra': (['3.2'], []),
'Kepler+Tesla': (['3.7'], []),
'Kepler': (['3.0', '3.5'], ['3.5']),
'Maxwell+Tegra': (['5.3'], []),
'Maxwell': (['5.0', '5.2'], ['5.2']),
'Pascal': (['6.0', '6.1'], ['6.1']),
'Pascal+Tegra': (['6.2'], []),
'Volta': (['7.0'], ['7.0']),
'Volta+Tegra': (['7.2'], []),
'Turing': (['7.5'], ['7.5']),
}.get(arch_name, (None, None))
if arch_bin is None:
raise InvalidArguments('Unknown CUDA Architecture Name {}!'
.format(arch_name))
cuda_arch_bin += arch_bin
if add_ptx:
if not arch_ptx:
arch_ptx = arch_bin
cuda_arch_ptx += arch_ptx
cuda_arch_bin = re.sub('\\.', '', ' '.join(cuda_arch_bin))
cuda_arch_ptx = re.sub('\\.', '', ' '.join(cuda_arch_ptx))
cuda_arch_bin = re.findall('[0-9()]+', cuda_arch_bin)
cuda_arch_ptx = re.findall('[0-9]+', cuda_arch_ptx)
cuda_arch_bin = sorted(list(set(cuda_arch_bin)))
cuda_arch_ptx = sorted(list(set(cuda_arch_ptx)))
nvcc_flags = []
nvcc_archs_readable = []
for arch in cuda_arch_bin:
m = re.match('([0-9]+)\\(([0-9]+)\\)', arch)
if m:
nvcc_flags += ['-gencode', 'arch=compute_' + m[2] + ',code=sm_' + m[1]]
nvcc_archs_readable += ['sm_' + m[1]]
else:
nvcc_flags += ['-gencode', 'arch=compute_' + arch + ',code=sm_' + arch]
nvcc_archs_readable += ['sm_' + arch]
for arch in cuda_arch_ptx:
nvcc_flags += ['-gencode', 'arch=compute_' + arch + ',code=compute_' + arch]
nvcc_archs_readable += ['compute_' + arch]
return nvcc_flags, nvcc_archs_readable
def initialize(*args, **kwargs):
return CudaModule(*args, **kwargs)

@ -0,0 +1,16 @@
project('cudamodule', 'cuda', version : '1.0.0')
nvcc = meson.get_compiler('cuda')
cuda = import('unstable-cuda')
arch_flags = cuda.nvcc_arch_flags(nvcc, 'Auto', detected: ['3.0'])
arch_readable = cuda.nvcc_arch_readable(nvcc, 'Auto', detected: ['3.0'])
driver_version = cuda.min_driver_version(nvcc)
message('NVCC version: ' + nvcc.version())
message('NVCC flags: ' + ' '.join(arch_flags))
message('NVCC readable: ' + ' '.join(arch_readable))
message('Driver version: >=' + driver_version)
exe = executable('prog', 'prog.cu', cuda_args: arch_flags)
test('cudatest', exe)

@ -0,0 +1,30 @@
#include <iostream>
int main(int argc, char **argv) {
int cuda_devices = 0;
std::cout << "CUDA version: " << CUDART_VERSION << "\n";
cudaGetDeviceCount(&cuda_devices);
if(cuda_devices == 0) {
std::cout << "No Cuda hardware found. Exiting.\n";
return 0;
}
std::cout << "This computer has " << cuda_devices << " Cuda device(s).\n";
cudaDeviceProp props;
cudaGetDeviceProperties(&props, 0);
std::cout << "Properties of device 0.\n\n";
std::cout << " Name: " << props.name << "\n";
std::cout << " Global memory: " << props.totalGlobalMem << "\n";
std::cout << " Shared memory: " << props.sharedMemPerBlock << "\n";
std::cout << " Constant memory: " << props.totalConstMem << "\n";
std::cout << " Block registers: " << props.regsPerBlock << "\n";
std::cout << " Warp size: " << props.warpSize << "\n";
std::cout << " Threads per block: " << props.maxThreadsPerBlock << "\n";
std::cout << " Max block dimensions: [ " << props.maxThreadsDim[0] << ", " << props.maxThreadsDim[1] << ", " << props.maxThreadsDim[2] << " ]" << "\n";
std::cout << " Max grid dimensions: [ " << props.maxGridSize[0] << ", " << props.maxGridSize[1] << ", " << props.maxGridSize[2] << " ]" << "\n";
std::cout << "\n";
return 0;
}
Loading…
Cancel
Save