subgroupSize
A read-only compute shader node that provides the number of threads (invocations) in a hardware 'Subgroup'. This value is essential for writing portable and efficient parallel algorithms.
Core Advantages
Its core advantage is enabling hardware-agnostic compute shader programming. By dynamically providing the GPU's subgroup size, it allows developers to write a single, portable algorithm that automatically adapts to different hardware (e.g., NVIDIA's 32-thread Warps or AMD's 64-thread Wavefronts) without hardcoding platform-specific values.
Common Uses
Performing parallel reductions (e.g., sum, max) within a subgroup.
Calculating a thread's subgroup index.
Allocating dedicated space for each subgroup in workgroup shared memory.
How to adjust
subgroupSize is a read-only hardware constant and cannot be changed. Therefore, the key is to design code that 'reacts' to its value, rather than 'adjusting' it. For example, a parallel reduction algorithm's loop bounds should be defined in terms of subgroupSize (e.g., `offset = subgroupSize.div(2)`) to ensure it works correctly on any GPU. Additionally, you can use conditional logic (e.g., `if (subgroupSize.equal(64))`) to create highly optimized code paths for specific hardware while maintaining a general fallback for others.
Code Examples
1// Get the subgroup size for the current hardware. This is a uint constant.
2const sSize = TSL.subgroupSize;
3
4// For visualization, convert the uint subgroupSize to a float and normalize it.
5// Dividing by 128.0 is a simple way to handle common sizes like 32 or 64.
6const normalizedSize = sSize.tofloat().div( 128.0 );