GFOR Usage

From Jacket Wiki

Jump to: navigation, search
Back to the Jacket by Example, Forward to GCOMPILE Usage

Contents

Introduction

The GFOR/GEND loop construct may be used to simultaneously launch all of the iterations of a FOR-loop on the GPU, as long as the iterations are independent. While the standard FOR-loop performs each iteration sequentially, Jacket's GFOR-loop performs each iteration at the same time. Jacket does this by tiling out the values of all loop iterations and then performing computation on those tiles in one pass.

You can think of GFOR as performing auto-vectorization of your code, e.g. you write a loop that operates on matrices but behind the scenes Jacket rewrites it to operate on volumes (on all matrices in parallel). For both MATLAB and Jacket code, it is better to vectorize computation as much as possible to avoid the overhead in both FOR-loops and GFOR-loops.


Many features and functions of Jacket are supported within GFOR-loops, for example,

  • element-wise arithmetic (addition, subtraction, multiplication, division, POWER, EXP)
  • FFT, FFT2, FFTN, and their inverses IFFT, IFFT2, IFFTN
  • transpose, ctranspose, and diag
  • matrix-matrix, matrix-vector, vector-vector multiply (mtimes)
  • subscripted assignment/referencing
  • reductions (SUM, MIN, MAX, ANY, ALL)

For a full list, see the GFOR Supported Functions page. You can also call GHELP within MATLAB, to inquire about any specific function's GFOR support. We appreciate your feedback on the Jacket Forums as we continue to expand GFOR functionality.

You can check to see if a variable involves the iterator using ISGFOR.

Note that there are several restrictions you should be aware of below.


Usage

User Functions called within GFOR

If you have defined a function that you want to call within a GFOR loop, then that function has to meet all the conditions described in this page in order to be able to work as expected.

Consider the (trivial) example below. The function Project1 has to satisfy all requirements for GFOR Usage, so you cannot use if-else conditions inside it.

A = grand(10000,20);
B = grand(10000,20);
n = 20;
ep = grand;
H = gzeros(10000,20);
gfor ii = 1:n
H(:,ii) = Project1(A(:,ii),B(:,ii),ep);
gend
 
function H = Project1(A,B,ep)
  if ep > 0       % BAD
   H = (A.*B)./ep;
  else
   H = A.*0;
  end
end

Multiplications

Jacket supports bulk multiplications of vector-vector, matrix-vector, and matrix-matrix types using GFOR. This is especially useful with many small matrices.

A = gones(n);
B = gones(1,n);
gfor k = 1:n
  B(k) = A(k,:) * A(:,k);  % vector-vector multiply
gend
 
A = gones(n,n,m);
[B C] = deal(gones(n));
gfor k = 1:m
  a = A(:,:,k);
  C(:,k) = a * B;          % matrix-vector multiply
gend
 
A = gones(n,n,m);
B = gones(n);
gfor k = 1:m
  A(:,:,k) = A(:,:,k) * B; % matrix-matrix multiply
gend

The Iterator

The iterator can be involved in expressions.

A = gones(n,n,m);
B = gones(n);
gfor k = 1:2:m
  A(:,:,k) = k*B + sin(k+1);  % expressions
gend

Iterator definitions can include arithmetic in expressions.

A = gones(n,n,m);
B = gones(n);
gfor k = m/4:m-m/4
 A(:,:,k) = k*B + sin(k+1);  % expressions
gend

Subscripting

More complicated subscripting is supported.

A = gones(n,n,m);
B = gones(n,10);
gfor k = 1:2:m
  A(:,1:10,k) = k*B;  % subscripting
gend

Iterators can be combined with arithmetic in subscripts.

A = gones(n,m);
B = gones(size(A));
gfor k = 2:m
  B(:,k) = A(:,k-1);
gend
A = gones(n,2*m);
B = gones(n,m);
gfor k = 2:m
  B(:,k) = A(:,2*k-1);
gend
A = gones(n,2*m);
B = gones(n,m);
gfor k = 2:m
  B(:,k) = A(:,floor(k+.2));
gend


In-Loop Reuse

Within the loop, you can use a result you just computed.

[A B C] = deal(gones(n));
gfor k = 1:n
  A(:,k) = 4 * B(:,k);
  C(:,k) = 4 * A(:,k); % use it again
gend

Although it is more efficient to store the value in a temporary variable:

[A B C] = deal(gones(n));
gfor k = 1:n
  a = 4 * B(:,k);
  A(:,k) = a;
  C(:,k) = 4 * a;
gend

Note, if the variable a above had not involved a GFOR expression, you may need to use LOCAL to designate it as a temporary variable specific to each iteration.

In-Place Computation

In some cases, GFOR behaves differently than the typical sequential FOR-loop. For example, you can read and modify a result in place as long as the accesses are independent.

A = gones(n);
gfor k = 1:n
  A(:,k) = sin(k) + A(:,k);
gend

The same subscripting and assignment behaviors used with GPU data also work with GFOR.

A = gones(n,n,m,k);
m = m * k;  % precompute since cannot have expressions in iterator
gfor k = 1:m
  A(:,:,k) = 4 * A(:,:,k); % collapse last dimension
gend

Random Data Generation

Random data should always be generated outside the GFOR loop. This is due to the fact that GFOR only passes over the body of the loop once. Therefore, any calls to GRAND inside the body of the loop will result in the same random matrix being assigned to every iteration of the loop.

For example, in the following trivial code, all columns of B are identical because A is only evaluated once:

gfor ii = 1:n
  A = grand(3,1);
  B(:,ii) = A;
gend
B
 
B =
 
    0.1209    0.1209    0.1209
    0.6432    0.6432    0.6432
    0.8746    0.8746    0.8746

This can be rectified by bringing the random number generation outside the loop, as follows:

A = grand(3,n);
gfor ii = 1:n
  B(:,ii) = A(:,ii);
gend
B
 
B =
 
    0.0892    0.1655    0.7807
    0.5626    0.5173    0.2932
    0.5664    0.5898    0.1391

This is a trivial example, but demonstrates the principle that random numbers should be pre-allocated outside the loop in most cases.


Local variables

New! Use LOCAL to create local copies of data that can be modified independently in each GFOR tile. In general, in a subscript assignment involving the iterator among the subscripts, Jacket assumes you are writing a final result out from the GFOR-loop. However, in many cases you may want to subscript assign results to a variable where each iteration (tile) has its own uniquely-modified copy of that variable.

In this first example, Jacket thinks you are subscripting into B shared by all iterations.

n = 5;
A = gsingle(1:n);
gfor ii = 1:n
  B = A; % fake copy
  B(ii) = 0; % write zeros in positions '1:n' of original matrix
  C(ii) = sum(B);
gend
C  % all zeros (==gzeros(1,n))

Compare that to indicating that B is local to each iteration:

A = gsingle(1:n);
gfor ii = 1:n
  B = local(ii, A); % create local copy
  B(ii) = 0;  % each tile has its own B with a different index zeroed out
  D(ii) = sum(B); % sum is different for each tile
gend
D  % all unique summations

Produces:

C =
     0
     0
     0
     0
     0
D =
    14
    13
    12
    11
    10


Restrictions

This preliminary implementation of GFOR has the following restrictions.

Iteration independence

The most important property of the loop body is that each iteration must be independent of the other iterations. Note that accessing the result of a separate iteration produces undefined behavior.

B = 0;
gfor k = 1:n
  B = B + k; % bad
gend

No conditional statements

No conditional statements in the body of the loop, (i.e. no branching). However, you can often find ways to overcome this restriction. Consider the following two examples:

Example 1: Problem

A = gones(n,m);
gfor k = 1:n
  if k > 10  % bad
    A(:,k) = k + 1;
  end
gend

However, you can do a few tricks to overcome this restriction by expressing the conditional statement as a multiplication by logical values. For instance, the block of code above can be converted to run as follows, without error:

Example 1: Solution

A = gones(n,m);
gfor k = 1:n
  condition = k > 10; % good
  A(:,k) = ~condition*A(:,k) + condition*(k+1);
gend

Another example of overcoming the conditional statement restriction in GFOR is as follows:

Example 2: Problem

A = gones(n,n,m);
B = grand(n);
gfor k = 1:4
  if mod(k,2) ~= 0
    A(:,:,k) = B + k;
  else
    A(:,:,k) = B * k;
  end
gend

Instead, you can make two passes over the same data, each pass performing one branch.

Example 2: Solution

A = gones(n,n,m);
B = grand(n);
gfor k = 1:2:4
  A(:,:,k) = B + k;
gend
gfor k = 2:2:4
  A(:,:,k) = B * k;
gend

Cell arrays

No cell array assignment.

gfor k = 1:n
  A{k} = k; % bad
gend

Nested loop restrictions

Nesting GFOR-loops within GFOR-loops is unsupported. You may interleave FOR-loops as long as they are completely independent of the GFOR-loop iterator.

gfor k = 1:n
  gfor j = 1:m  % bad
  % ...
  gend
gend

This will produce a warning:

Warning: Detected possible nested GFOR

To detect nesting, Jacket uses a simple internal switch: GFOR turns it on, GEND turns it off.

gfor ii = 1:8
gend
gfor ii = 1:8
gfor ii = 1:8    # redefine 'ii' and produce false alarm warning
Warning: Detected possible nested GFOR
gend
gfor ii = 1:8    # no problem
gend

It's common to see this error while developing, for example, if you abort a GFOR loop from a coding error and then re-run it then you will see the false warning. At any time, you can run GFOR by itself (no inputs) to reset the counter. From a clean MATLAB prompt, before using the GPU, you should not see this warning. If you do see the warning in that case, start searching for possible nesting, e.g. one GFOR loop calls a function which itself has a GFOR loop.

Nesting FOR-loops within GFOR-loops is supported, as long as the GFOR iterator is not used in the FOR loop iterator, as follows:

gfor k = 1:n
  for j = 1:m+k % bad
  % ...
  end
gend
gfor k = 1:n
  for j = 1:m   % good
  % ...
  end
gend

Nesting GFOR-loops inside of FOR-loops is fully supported.

for k = 1:n
  gfor j = 1:m  % good
  % ...
  gend
end

To investigate the situation, you can always jump into the MATLAB debugger upon a warning:

  dbstop if warning

GFOR command line parsing

GFOR must be on a line by itself. Trailing comments are allowed.

gfor k = 1:n; A(:,k) = k;
gend % bad
 
gfor k = 1:n  % this comment is okay
  A(:,k) = k;
gend
 
gfor k = 1:ceil(2*n)/2  % expressions
  A(:,k) = k;
gend

Iterator not usable outside body

Do not use the iterator after GEND. Its value will not be that of the final iteration.

gfor k = 1:n
  % ...
gend
A = A / k; % bad

Memory considerations

Since each computation is done in parallel for all iterator values, you need to have enough card memory available to do all iterations simultaneously. If the problem exceeds memory, it will trigger "out of memory" errors.

You can work around the memory limitations of your GPU by breaking the GFOR loop up into segments; however, you might want to consider using a larger memory GPU.

% BEFORE
gfor k = 1:400
  B = A(:,k);
  C(:,:,k) = B * B';  % outer product expansion runs out of memory
gend
 
% AFTER
for kk = 1:100:400
  gfor k = kk:kk+100-1  % four batches of 100
    B = A(:,k);
    C(:,:,k) = B * B';  % now several smaller problems fit in card memory
  gend
end


Unacceptable iterator names

Jacket might throw a warning if you try to use i as an iterator name for GFOR. Within MATLAB functions, the GFOR iterator must not use the variable names I or J, since these are reserved for complex variables (this is a MATLAB bug). Use instead, k or some other variable.

For example, running from main prompt you might get this error with i:

gfor i = 1:4
Warning: GFOR variable name conflicts with builtin command
If using gfor within a function then preallocate:
  i = [];
  gfor i = 1:n
    ...
  gend

However, if you already have a variable in your workspace with that same name, you would not get an error because the variable would simply be overwritten. Again from the main prompt:

i = 0;   % variable now in namespace
gfor i = 1:4   % Success

Iterator not allowed in colon expressions

At present, subscript indexing matrices and vectors inside GFOR may only be done with the loop iterator. Other variables or expressions involving the iterator will error out.

A = gones(100,100);
gfor k = 5:95
  A(k-4:k+4,k-4:k-4) = ... %  bad
gend

To work around this, try pre-calculating your set of offsets and apply them using vector arithmetic:

A = gones(100,100);
idx = -4:4;
gfor k = 5:95
  A(k+idx,k+idx) = ... %  good
gend

It's even faster to already mark that offset vector for the GPU, especially if it is large:

A = gones(100,100);
idx = gsingle(-4:4);
gfor k = 5:95
  A(k+idx,k+idx) = ... %  good
gend

There is currently no workaround for cases where the subscript changes for each GFOR iteration:

gfor k = 5:100
  A(k:100) = ... % bad -- changes size with each iteration: 5:100, 6:100, 7:100,...
gend

Iterator must be uniformly spaced

The iterator expression must be a row vector of uniformly spaced real values.

gfor i = 1:n       % good
gfor i = m:n       % good
gfor i = 5:2:100   % good
gfor i = 1:2:n     % good
gfor i = [1 4 2 3] % bad

Subscripted data cannot be directly pulled back to MATLAB

It's not straight forward to convert GFOR computations to regular CPU variables because the GFOR data simultaneosly contains multiple copies (tiles), one for each iteration being computed in parallel. Converting the GFOR variable to the CPU would need to account for this extra dimension. However, changing the dimension of the data implicitly upon conversion would have other effects, for example, the following would fail to hold: size(A)==size(single(A)).

The way we currently deal with this is just to warn the user and only put the first iteration in a MATLAB variable. Here's a contrived example to show this:

>> A = grand(2)
A =
    0.1209    0.8746
    0.6432    0.3369
 
>> gfor k = 1:2
>>   a = A(:,k);
>>   single(a)
Warning: Only storing result from first GFOR iteration into MATLAB variable 
ans =
    0.1209
    0.6432

Another place this often shows up is in IF statements which implicitly pull the data back to the CPU producing this warning.

To investigate the situation, you can always jump into the MATLAB debugger upon such a warning:

  dbstop if warning

Starting with v1.8, we show the result of each iteration along the last dimension.

>> A = grand(2,3)
A =
    0.6206    0.5730    0.4752
    0.5977    0.8232    0.0074
 
>> gfor k = 1:3
>>  a = A(:,k)
Warning: Last dimension indicates independent iterations
a =
(:,:,1) =    % first iteration (k==1)
    0.6206
    0.5977
(:,:,2) =    % second iteration (k==2)
    0.5730
    0.8232
(:,:,3) =    % third iteration (k==3)
    0.4752
    0.0074

Subscripting into CPU variables

Because the direct conversion of GFOR data into a CPU variable is undefined, it is also undefined to use GFOR data to subscript into a CPU variable. We catch this and error out:

A = rand(4);
gfor ii = 1:4 
  A(ii);
??? Error using ==> garray.subsindex at 3
Undefined behavior: CPU variables subscripted with GFOR expressions

This error is occurs when MATLAB is attempting to index into a regular CPU variable using a GPU variable that was involved in a GFOR computation. In this case, MATLAB requires CPU indices, yet these indices are tiled data so it is unclear which tile to return. See above for more.

To get around this, cast your variable (A above) to the GPU:

A = grand(4);
gfor ii = 1:4
  A(ii);
gend

In some cases, you may want to use LOCAL to create independent copies of CPU (or GPU) variables for each iteration.

Unsupported functions

The following functions are Jacket-supported, but not yet GFOR-supported:
(Note: for the list of GFOR-supported functions, click here)

DynamicPageList: No results!

Forward to GCOMPILE Usage


No logical indexing

Logical indexing like the following is not supported:

gfor i = 1:n
  B = A(:,i);
  tmp = B(B > .5);  % "Error: GFOR not supported with logical indexing"
  D(i) = sum(tmp);
gend

The problem is that every GFOR tile has a different number of elements, something which GFOR cannot yet handle.

Similar to the workaround for conditional statements, it might work to use masked arithmetic:

gfor i = 1:n
  B = A(:,i);
  mask = B > .5;
  D(i) = sum(mask .* B);
gend

New! SUBSASGN with scalars and logical masks is supported:

gfor i = 1:n
  a = A(:,i);
  a(isnan(a)) = 0;
  A(:,i) = a;
gend
Personal tools