|
|
MATLAB® is a registered trademark of MathWorks Inc. (More info) |
GFOR Usage
From Jacket Wiki
Back to the Jacket by Example, Forward to GCOMPILE Usage
Introduction
The GFOR/GEND loop construct may be used to simultaneously launch all of the iterations of a FOR-loop on the GPU, as long as the iterations are independent. While the standard FOR-loop performs each iteration sequentially, Jacket's GFOR-loop performs each iteration at the same time. Jacket does this by tiling out the values of all loop iterations and then performing computation on those tiles in one pass.
You can think of GFOR as performing auto-vectorization of your code, e.g. you write a loop that operates on matrices but behind the scenes Jacket rewrites it to operate on volumes (on all matrices in parallel). For both MATLAB and Jacket code, it is better to vectorize computation as much as possible to avoid the overhead in both FOR-loops and GFOR-loops.
Many features and functions of Jacket are supported within GFOR-loops, for example,
- element-wise arithmetic (addition, subtraction, multiplication, division, POWER, EXP)
- FFT, FFT2, FFTN, and their inverses IFFT, IFFT2, IFFTN
- transpose, ctranspose, and diag
- matrix-matrix, matrix-vector, vector-vector multiply (mtimes)
- subscripted assignment/referencing
- reductions (SUM, MIN, MAX, ANY, ALL)
For a full list, see the GFOR Supported Functions page. You can also call GHELP within MATLAB, to inquire about any specific function's GFOR support. We appreciate your feedback on the Jacket Forums as we continue to expand GFOR functionality. GFOR is not supported with GCOMPILE and ARRAYFUN.
You can check to see if a variable involves the iterator using ISGFOR.
Note that there are several restrictions you should be aware of below.
Usage
User Functions called within GFOR
If you have defined a function that you want to call within a GFOR loop, then that function has to meet all the conditions described in this page in order to be able to work as expected.
Consider the (trivial) example below. The function Project1 has to satisfy all requirements for GFOR Usage, so you cannot use if-else conditions inside it.
A = grand(10000,20); B = grand(10000,20); n = 20; ep = grand; H = gzeros(10000,20); gfor ii = 1:n H(:,ii) = Project1(A(:,ii),B(:,ii),ep); gend function H = Project1(A,B,ep) if ep > 0 % BAD H = (A.*B)./ep; else H = A.*0; end end
Multiplications
Jacket supports bulk multiplications of vector-vector, matrix-vector, and matrix-matrix types using GFOR. This is especially useful with many small matrices.
A = gones(n); B = gones(1,n); gfor k = 1:n B(k) = A(k,:) * A(:,k); % vector-vector multiply gend A = gones(n,n,m); [B C] = deal(gones(n)); gfor k = 1:m a = A(:,:,k); C(:,k) = a * B; % matrix-vector multiply gend A = gones(n,n,m); B = gones(n); gfor k = 1:m A(:,:,k) = A(:,:,k) * B; % matrix-matrix multiply gend
The Iterator
The iterator can be involved in expressions.
A = gones(n,n,m); B = gones(n); gfor k = 1:2:m A(:,:,k) = k*B + sin(k+1); % expressions gend
Iterator definitions can include arithmetic in expressions.
A = gones(n,n,m); B = gones(n); gfor k = m/4:m-m/4 A(:,:,k) = k*B + sin(k+1); % expressions gend
Subscripting
More complicated subscripting is supported.
A = gones(n,n,m); B = gones(n,10); gfor k = 1:2:m A(:,1:10,k) = k*B; % subscripting gend
Iterators can be combined with arithmetic in subscripts.
A = gones(n,m); B = gones(size(A)); gfor k = 2:m B(:,k) = A(:,k-1); gend
A = gones(n,2*m); B = gones(n,m); gfor k = 2:m B(:,k) = A(:,2*k-1); gend
A = gones(n,2*m); B = gones(n,m); gfor k = 2:m B(:,k) = A(:,floor(k+.2)); gend
In-Loop Reuse
Within the loop, you can use a result you just computed.
[A B C] = deal(gones(n)); gfor k = 1:n A(:,k) = 4 * B(:,k); C(:,k) = 4 * A(:,k); % use it again gend
Although it is more efficient to store the value in a temporary variable:
[A B C] = deal(gones(n)); gfor k = 1:n a = 4 * B(:,k); A(:,k) = a; C(:,k) = 4 * a; gend
Note, if the variable a above had not involved a GFOR expression, you may need to use LOCAL to designate it as a temporary variable specific to each iteration.
In-Place Computation
In some cases, GFOR behaves differently than the typical sequential FOR-loop. For example, you can read and modify a result in place as long as the accesses are independent.
A = gones(n); gfor k = 1:n A(:,k) = sin(k) + A(:,k); gend
The same subscripting and assignment behaviors used with GPU data also work with GFOR.
A = gones(n,n,m,k); m = m * k; % precompute since cannot have expressions in iterator gfor k = 1:m A(:,:,k) = 4 * A(:,:,k); % collapse last dimension gend
Random Data Generation
Random data should always be generated outside the GFOR loop. This is due to the fact that GFOR only passes over the body of the loop once. Therefore, any calls to GRAND inside the body of the loop will result in the same random matrix being assigned to every iteration of the loop.
For example, in the following trivial code, all columns of B are identical because A is only evaluated once:
gfor ii = 1:n A = grand(3,1); B(:,ii) = A; gend B B = 0.1209 0.1209 0.1209 0.6432 0.6432 0.6432 0.8746 0.8746 0.8746
This can be rectified by bringing the random number generation outside the loop, as follows:
A = grand(3,n); gfor ii = 1:n B(:,ii) = A(:,ii); gend B B = 0.0892 0.1655 0.7807 0.5626 0.5173 0.2932 0.5664 0.5898 0.1391
This is a trivial example, but demonstrates the principle that random numbers should be pre-allocated outside the loop in most cases.
Local variables
New! Use LOCAL to create local copies of data that can be modified independently in each GFOR tile. In general, in a subscript assignment involving the iterator among the subscripts, Jacket assumes you are writing a final result out from the GFOR-loop. However, in many cases you may want to subscript assign results to a variable where each iteration (tile) has its own uniquely-modified copy of that variable.
In this first example, Jacket thinks you are subscripting into B shared by all iterations.
n = 5; A = gsingle(1:n); gfor ii = 1:n B = A; % fake copy B(ii) = 0; % write zeros in positions '1:n' of original matrix C(ii) = sum(B); gend C % all zeros (==gzeros(1,n))
Compare that to indicating that B is local to each iteration:
A = gsingle(1:n); gfor ii = 1:n B = local(ii, A); % create local copy B(ii) = 0; % each tile has its own B with a different index zeroed out D(ii) = sum(B); % sum is different for each tile gend D % all unique summations
Produces:
C =
0
0
0
0
0
D =
14
13
12
11
10
COLON expansion
New in v2.1! GFOR can be used in COLON expressions to generate sequences; however, support is limited to generating uniformly sized sequences, e.g. k-1:k+1 or k:k+3 where k is the GFOR iterator. The GFOR iterator must appear on both sides of the colon.
gfor k = start:end A(k-1:k+2) % valid A(k:k+3) % valid A(k-4:k) % valid A(k+1:k+3) % valid A(k:end) % FAIL .. non-uniform tile sizes A(k:5) % FAIL .. non-uniform tile sizes A(1-k:k) % FAIL .. negative iterator not supported A((k+4)/4:k) % FAIL .. too complicated, must be of form "k+<scalar>" on both sides gend
For workarounds to various limitations, see suggestions below.
Restrictions
This preliminary implementation of GFOR has the following restrictions.
Iteration independence
The most important property of the loop body is that each iteration must be independent of the other iterations. Note that accessing the result of a separate iteration produces undefined behavior.
B = 0; gfor k = 1:n B = B + k; % bad gend
No conditional statements
No conditional statements in the body of the loop, (i.e. no branching). However, you can often find ways to overcome this restriction. Consider the following two examples:
Example 1: Problem
A = gones(n,m); gfor k = 1:n if k > 10 % bad A(:,k) = k + 1; end gend
However, you can do a few tricks to overcome this restriction by expressing the conditional statement as a multiplication by logical values. For instance, the block of code above can be converted to run as follows, without error:
Example 1: Solution
A = gones(n,m); gfor k = 1:n condition = k > 10; % good A(:,k) = ~condition*A(:,k) + condition*(k+1); gend
Another example of overcoming the conditional statement restriction in GFOR is as follows:
Example 2: Problem
A = gones(n,n,m); B = grand(n); gfor k = 1:4 if mod(k,2) ~= 0 A(:,:,k) = B + k; else A(:,:,k) = B * k; end gend
Instead, you can make two passes over the same data, each pass performing one branch.
Example 2: Solution
A = gones(n,n,m); B = grand(n); gfor k = 1:2:4 A(:,:,k) = B + k; gend gfor k = 2:2:4 A(:,:,k) = B * k; gend
Cell arrays
No cell array assignment.
gfor k = 1:n A{k} = k; % bad gend
Nested loop restrictions
Nesting GFOR-loops within GFOR-loops is unsupported. You may interleave FOR-loops as long as they are completely independent of the GFOR-loop iterator.
gfor k = 1:n gfor j = 1:m % bad % ... gend gend
This will produce a warning:
Warning: Detected possible nested GFORTo detect nesting, Jacket uses a simple internal switch: GFOR turns it on, GEND turns it off.
gfor ii = 1:8 gend gfor ii = 1:8 gfor ii = 1:8 # redefine 'ii' and produce false alarm warning Warning: Detected possible nested GFOR gend gfor ii = 1:8 # no problem gend
It's common to see this error while developing, for example, if you abort a GFOR loop from a coding error and then re-run it then you will see the false warning. At any time, you can run GFOR by itself (no inputs) to reset the counter. From a clean MATLAB prompt, before using the GPU, you should not see this warning. If you do see the warning in that case, start searching for possible nesting, e.g. one GFOR loop calls a function which itself has a GFOR loop.
Nesting FOR-loops within GFOR-loops is supported, as long as the GFOR iterator is not used in the FOR loop iterator, as follows:
gfor k = 1:n for j = 1:m+k % bad % ... end gend
gfor k = 1:n for j = 1:m % good % ... end gend
Nesting GFOR-loops inside of FOR-loops is fully supported.
for k = 1:n gfor j = 1:m % good % ... gend end
To investigate the situation, you can always jump into the MATLAB debugger upon a warning:
dbstop if warning
GFOR command line parsing
GFOR must be on a line by itself. Trailing comments are allowed.
gfor k = 1:n; A(:,k) = k; gend % bad gfor k = 1:n % this comment is okay A(:,k) = k; gend gfor k = 1:ceil(2*n)/2 % expressions A(:,k) = k; gend
Iterator not usable outside body
Do not use the iterator after GEND. Its value will not be that of the final iteration.
gfor k = 1:n % ... gend A = A / k; % bad
Memory considerations
Since each computation is done in parallel for all iterator values, you need to have enough card memory available to do all iterations simultaneously. If the problem exceeds memory, it will trigger "out of memory" errors.
You can calculate memory as follows. For example:
Y = grand(1,200,50); gfor id = 1:200 B(:,:,id) = repmat(Y(1,id,:), 100, 100); % calculate memory for this line ... ... gend
Here, each iteration of the line will take up 3.81 MB on the GPU:
>> 100*100*50*8/2^20
ans =
3.8147
Thus, 200 iterations of that operation will take 762 MB.
>> 3.8147*200 ans = 762.9395
You can work around the memory limitations of your GPU by breaking the GFOR loop up into segments; however, you might want to consider using a larger memory GPU.
% BEFORE gfor k = 1:400 B = A(:,k); C(:,:,k) = B * B'; % outer product expansion runs out of memory gend % AFTER for kk = 1:100:400 gfor k = kk:kk+100-1 % four batches of 100 B = A(:,k); C(:,:,k) = B * B'; % now several smaller problems fit in card memory gend end
Unacceptable iterator names
Jacket might throw a warning if you try to use i as an iterator name for GFOR. Within MATLAB functions, the GFOR iterator must not use the variable names I or J, since these are reserved for complex variables (this is a MATLAB bug). Use instead, k or some other variable.
For example, running from main prompt you might get this error with i:
gfor i = 1:4 Warning: GFOR variable name conflicts with builtin command If using gfor within a function then preallocate: i = []; gfor i = 1:n ... gend
However, if you already have a variable in your workspace with that same name, you would not get an error because the variable would simply be overwritten. Again from the main prompt:
i = 0; % variable now in namespace gfor i = 1:4 % Success
Iterator not allowed in colon expressions
New in v2.1! Limited COLON support recently added, see COLON expansion. Read on for workarounds for various limitations.
At present, subscript indexing matrices and vectors inside GFOR may only be done with the loop iterator and simple addition/subtraction with a scalar, e.g. i+4. Other variables or expressions involving the iterator may error out. To work around this, try pre-calculating your set of offsets and apply them using vector arithmetic:
A = gones(100,100); idx = -4:4; gfor k = 5:95 A(k+idx,k+idx) = ... % good gend
It's even faster to already convert that offset vector for the GPU, especially if it is large:
A = gones(100,100); idx = gsingle(-4:4); gfor k = 5:95 A(k+idx,k+idx) = ... % good gend
There is currently no workaround for cases where the subscript changes for each GFOR iteration:
gfor k = 5:100 A(k:100) = ... % bad -- changes size with each iteration: 5:100, 6:100, 7:100,... gend
Iterator must be uniformly spaced
The iterator expression must be a row vector of uniformly spaced real values.
gfor i = 1:n % good gfor i = m:n % good gfor i = 5:2:100 % good gfor i = 1:2:n % good gfor i = [1 4 2 3] % bad
Subscripted data cannot be directly pulled back to MATLAB
It's not straight forward to convert GFOR computations to regular CPU variables because the GFOR data simultaneosly contains multiple copies (tiles), one for each iteration being computed in parallel. Converting the GFOR variable to the CPU would need to account for this extra dimension. However, changing the dimension of the data implicitly upon conversion would have other effects, for example, the following would fail to hold: size(A)==size(single(A)).
The way we currently deal with this is just to warn the user and only put the first iteration in a MATLAB variable. Here's a contrived example to show this:
>> A = grand(2) A = 0.1209 0.8746 0.6432 0.3369 >> gfor k = 1:2 >> a = A(:,k); >> single(a) Warning: Only storing result from first GFOR iteration into MATLAB variable ans = 0.1209 0.6432
Another place this often shows up is in IF statements which implicitly pull the data back to the CPU producing this warning.
To investigate the situation, you can always jump into the MATLAB debugger upon such a warning:
dbstop if warning
Starting with v1.8, we show the result of each iteration along the last dimension.
>> A = grand(2,3) A = 0.6206 0.5730 0.4752 0.5977 0.8232 0.0074 >> gfor k = 1:3 >> a = A(:,k) Warning: Last dimension indicates independent iterations a = (:,:,1) = % first iteration (k==1) 0.6206 0.5977 (:,:,2) = % second iteration (k==2) 0.5730 0.8232 (:,:,3) = % third iteration (k==3) 0.4752 0.0074
Subscripting into CPU variables
Because the direct conversion of GFOR data into a CPU variable is undefined, it is also undefined to use GFOR data to subscript into a CPU variable. We catch this and error out:
A = rand(4); gfor ii = 1:4 A(ii); ??? Error using ==> garray.subsindex at 3 Undefined behavior: CPU variables subscripted with GFOR expressions
This error is occurs when MATLAB is attempting to index into a regular CPU variable using a GPU variable that was involved in a GFOR computation. In this case, MATLAB requires CPU indices, yet these indices are tiled data so it is unclear which tile to return. See above for more.
To get around this, cast your variable (A above) to the GPU:
A = grand(4); gfor ii = 1:4 A(ii); gend
In some cases, you may want to use LOCAL to create independent copies of CPU (or GPU) variables for each iteration.
Unsupported functions
The following functions are Jacket-supported, but not yet GFOR-supported:
(Note: for the list of GFOR-supported functions, click here)
DynamicPageList: No results!
Forward to GCOMPILE Usage
No logical indexing
Logical indexing like the following is not supported:
gfor i = 1:n B = A(:,i); tmp = B(B > .5); % "Error: GFOR not supported with logical indexing" D(i) = sum(tmp); gend
The problem is that every GFOR tile has a different number of elements, something which GFOR cannot yet handle.
Similar to the workaround for conditional statements, it might work to use masked arithmetic:
gfor i = 1:n B = A(:,i); mask = B > .5; D(i) = sum(mask .* B); gend
New! SUBSASGN with scalars and logical masks is supported:
gfor i = 1:n a = A(:,i); a(isnan(a)) = 0; A(:,i) = a; gend