workgroupArray
(Compute Shader Only) Declares a high-speed, shared memory array within a workgroup. It enables efficient data exchange among all threads in the same workgroup, forming the basis for high-performance parallel algorithms (e.g., reductions, convolutions).
Core Advantages
Provides access to extremely fast, on-chip local shared memory. This ultra-fast inter-thread communication capability significantly reduces accesses to slower global memory and is key to implementing advanced GPGPU algorithms and performance optimizations.
Common Uses
Implementing parallel reduction algorithms (e.g., sum, max)
Serving as a pixel neighborhood cache in image processing (e.g., large kernel convolutions)
Caching data tiles in tiled matrix multiplication
How to adjust
Adjusted by changing the `type` and `count` at creation. Increasing `count` allows a single workgroup to process more data, but is constrained by hardware shared memory limits. Changing the `type` (e.g., from 'f32' to 'vec4') can improve efficiency with vectorized computations but consumes more memory.
Code Examples
1// Load data from global memory into the shared array
2sharedData.element( localIndex ).assign( globalInput.element( globalIndex ) );
3
4// Wait for all threads in the workgroup to finish loading
5workgroupBarrier();
6
7// Threads collaboratively process data in shared memory (e.g., perform one reduction step)
8const myData = sharedData.element( localIndex );
9const neighborData = sharedData.element( localIndex.add( stride ) );
10myData.assign( myData.add( neighborData ) );