core/intrinsics/gpu.rs
1//! Intrinsics for GPU targets.
2//!
3//! Intrinsics in this module are intended for use on GPU targets.
4//! They can be target specific but in general GPU targets are similar.
5
6#![unstable(feature = "gpu_intrinsics", issue = "none")]
7
8/// Returns the pointer to workgroup memory allocated at launch-time on GPUs.
9///
10/// Workgroup memory is a memory region that is shared between all threads in
11/// the same workgroup. It is faster to access than other memory but pointers do not
12/// work outside the workgroup where they were obtained.
13/// Workgroup memory can be allocated statically or after compilation, when
14/// launching a gpu-kernel. `gpu_launch_sized_workgroup_mem` returns the pointer to
15/// the memory that is allocated at launch-time.
16/// The size of this memory can differ between launches of a gpu-kernel, depending on
17/// what is specified at launch-time.
18/// However, the alignment is fixed by the kernel itself, at compile-time.
19///
20/// The returned pointer is the start of the workgroup memory region that is
21/// allocated at launch-time.
22/// All calls to `gpu_launch_sized_workgroup_mem` in a workgroup, independent of the
23/// generic type, return the same address, so alias the same memory.
24/// The returned pointer is aligned by at least the alignment of `T`.
25///
26/// If `gpu_launch_sized_workgroup_mem` is invoked multiple times with different
27/// types that have different alignment, then you may only rely on the resulting
28/// pointer having the alignment of `T` after a call to `gpu_launch_sized_workgroup_mem::<T>`
29/// has occurred in the current program execution.
30///
31/// # Safety
32///
33/// The pointer is safe to dereference from the start (the returned pointer) up to the
34/// size of workgroup memory that was specified when launching the current gpu-kernel.
35/// This allocated size is not related in any way to `T`.
36///
37/// The user must take care of synchronizing access to workgroup memory between
38/// threads in a workgroup. The usual data race requirements apply.
39///
40/// # Other APIs
41///
42/// CUDA and HIP call this dynamic shared memory, shared between threads in a block.
43/// OpenCL and SYCL call this local memory, shared between threads in a work-group.
44/// GLSL calls this shared memory, shared between invocations in a work group.
45/// DirectX calls this groupshared memory, shared between threads in a thread-group.
46#[must_use = "returns a pointer that does nothing unless used"]
47#[rustc_intrinsic]
48#[rustc_nounwind]
49#[unstable(feature = "gpu_launch_sized_workgroup_mem", issue = "135513")]
50#[cfg(any(target_arch = "amdgpu", target_arch = "nvptx64"))]
51pub fn gpu_launch_sized_workgroup_mem<T>() -> *mut T;
52
53/// Returns a pointer to the HSA kernel dispatch packet.
54///
55/// A `gpu-kernel` on amdgpu is always launched through a kernel dispatch packet.
56/// The dispatch packet contains the workgroup size, launch size and other data.
57/// The content is defined by the [HSA Platform System Architecture Specification],
58/// which is implemented e.g. in AMD's [hsa.h].
59/// The intrinsic returns a unit pointer so that rustc does not need to know the packet struct.
60/// The pointer is valid for the whole lifetime of the program.
61///
62/// [HSA Platform System Architecture Specification]: https://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf
63/// [hsa.h]: https://github.com/ROCm/rocm-systems/blob/rocm-7.1.0/projects/rocr-runtime/runtime/hsa-runtime/inc/hsa.h#L2959
64#[rustc_nounwind]
65#[rustc_intrinsic]
66#[cfg(target_arch = "amdgpu")]
67#[must_use = "returns a pointer that does nothing unless used"]
68pub fn amdgpu_dispatch_ptr() -> *const ();