GFOR
From Jacket Wiki
Back to Jacket Function List, Jacket Basics
Run many loop iterations simultaneously on the GPU. See GFOR Usage for more detailed information.
Supported Syntax
gfor n = 1:10 % loop body gend
Description
The GFOR/GEND loop construct may be used to simultaneously launch all of the iterations of a FOR-loop on the GPU, as long as the iterations are independent. While the standard FOR-loop performs each iteration sequentially, Jacket's GFOR-loop performs each iteration at the same time. Jacket does this by tiling out the values of all loop iterations and then performing computation on those tiles in one pass.
Note that GFOR has many restrictions, described on the GFOR Usage page. In general, use GFOR loops only if all the restrictions described in this page for GFOR are met. Since there are many restrictions, most loops will likely be better off using GPU variables inside the body of regular FOR loops to simply achieve "per iteration" speedups. Finally, note that for both MATLAB and Jacket code, it is better to vectorize as much as possible to avoid FOR-loops and GFOR-loops.
You can think of GFOR-loops as vectorizing regular FOR-loops: all iterations of the GFOR-loop are done in parallel. For example, you could write a FOR-loop to increment every element of a vector, or you could "vectorize" and simply do it all in one operation:
for i = 1:n A(i) = A(i) + 1; end A = A + 1; % vectorized version
In a similar fashion, you could run an FFT on every 2D slice of a volume in a FOR-loop, or you could "vectorize" and simply do it all in one GFOR-loop operation:
for i = 1:n A(:,:,i) = fft2(A(:,:,i)); % runs each FFT in sequence end gfor i = 1:n A(:,:,i) = fft2(A(:,:,i)); % vectorized version: runs 'n' FFTs in parallel gend
Examples
A = gones(n); B = gones(1,n); gfor k = 1:n B(k) = A(k,:) * A(:,k); % vector-vector multiply gend
A = gones(n,n,m); B = gones(n); gfor k = 1:m A(:,:,k) = A(:,:,k) * B; % matrix-matrix multiply gend
A = grand(n,m); B = gzeros(n,m); gfor k = 1:m B(:,k) = fft(A(:,k)); % FFT gend
Iterators may be combined with arithmetic in subscripts.
A = gones(n,m); B = gones(size(A)); gfor k = 2:m B(:,k) = A(:,k-1); gend
Use LOCAL to create scratch variables unique to each iteration (tile).
A = gones(n,m); B = gones(size(A)); gfor k = 1:m a = A(:,k); b = local(k, gzeros(n,1)); % each GFOR tile gets its own unique copy of 'b' b([1 3]) = a([1 3]); B(:,k) = b; gend