nvptx64-nvidia-cuda
Tier: 2
This is the target meant for deploying code for Nvidia® accelerators based on their CUDA platform.
Target maintainers
Requirements
This target is no_std
and will typically be built with crate-type cdylib
and -C linker-flavor=llbc
, which generates PTX.
The necessary components for this workflow are:
rustup toolchain add nightly
rustup component add llvm-tools --toolchain nightly
rustup component add llvm-bitcode-linker --toolchain nightly
There are two options for using the core library:
rustup component add rust-src --toolchain nightly
and build using-Z build-std=core
.rustup target add nvptx64-nvidia-cuda --toolchain nightly
Target and features
It is generally necessary to specify the target, such as -C target-cpu=sm_89
, because the default is very old. This implies two target features: sm_89
and ptx78
(and all preceding features within sm_*
and ptx*
). Rust will default to using the oldest PTX version that supports the target processor (see this table), which maximizes driver compatibility.
One can use -C target-feature=+ptx80
to choose a later PTX version without changing the target (the default in this case, ptx78
, requires CUDA driver version 11.8, while ptx80
would require driver version 12.0).
Later PTX versions may allow more efficient code generation.
Although Rust follows LLVM in representing ptx*
and sm_*
as target features, they should be thought of as having crate granularity, set via (either via -Ctarget-cpu
and optionally -Ctarget-feature
).
While the compiler accepts #[target_feature(enable = "ptx80", enable = "sm_89")]
, it is not supported, may not behave as intended, and may become erroneous in the future.
Building Rust kernels
A no_std
crate containing one or more functions with extern "ptx-kernel"
can be compiled to PTX using a command like the following.
$ RUSTFLAGS='-Ctarget-cpu=sm_89' cargo +nightly rustc --target=nvptx64-nvidia-cuda -Zbuild-std=core --crate-type=cdylib -- -Clinker-flavor=llbc -Zunstable-options
Intrinsics in core::arch::nvptx
may use #[cfg(target_feature = "...")]
, thus it's necessary to use -Zbuild-std=core
with appropriate RUSTFLAGS
. The following components are needed for this workflow:
$ rustup component add rust-src --toolchain nightly
$ rustup component add llvm-tools --toolchain nightly
$ rustup component add llvm-bitcode-linker --toolchain nightly