Array Indexing In Jacket

From Jacket Wiki

Jump to: navigation, search

Contents

Indexing of arrays is very important when considering performance. It is very easy to loose the speedup that we may obtain from fast computations on the GPU. This page takes a look at array indexing and demonstrates with simple examples how indexing could be done to obtain high performance. As some of the examples use the new class concept of Jacket, the following assumes that the reader has studied this blog post, the post on Torben's Corner on MATLAB/Jacket coding and are familiar with CLASS and ISA.


Code

In this case the input to the function is the two vectors a and b. We would like the function to work immediately with MATLAB as well as Jacket. There are five different versions of the code (an overview is provided after the code listings):

FOO1.m

function [ y ] = foo1( a, b )
  % foo1 Computes the output r for both MATLAB and Jacket variables.
 
  % N is the length of the input vectors a and b; no sanity check for ease
  N = length(a);
 
  % Determine the common class
  cls = superiorfloat(a,b);
 
  % Create d vector; will type wise follow the a and b vectors
  d = randn(N,1,cls);
 
  % Compute the output
  r = b(1:N-1) .* exp(sin( d(1:N-1) ./ d(2:N) .* a(1:N-1) ./ a(2:N) )) + b(2:N);
  y = sum(r(:));
end

This version works with MATLAB as well as Jacket input variables without any problems at all. Unfortunately, performance is not the best so we will try another version to see if we can improve things a bit. The idea here is to allocate the index variables as garrays. This is implemented in the following foo2.m function:


FOO2.m

function [ y ] = foo2( a, b )
  % foo2 Computes the output r for both MATLAB and Jacket variables.
 
  % N is the length of the input vectors a and b; no sanity check for ease
  N = length(a);
 
  % Determine the common class
  cls = superiorfloat(a,b);
 
  % Define scalar '0' and '1' used in colon expansion
  one = ones(cls);
 
  % Create d vector; will type wise follow the a and b vectors
  d = randn(N,1,cls);
 
  % Indexing pointers following the class of the input vectors
  p1 = one:(N-1)*one;
  p2 = 2*one:N*one;
 
  % Compute the output;
  r = b(p1) .* exp(sin( d(p1) ./ d(p2) .* a(p1) ./ a(p2) )) + b(p2);
  y = sum(r(:));
end


FOO3.m

function [ y ] = foo3( a, b )
  % foo1 Computes the output r for both MATLAB and Jacket variables.
  %
  % max(N) < length(a)-1
 
  % Vector length
  N = length(a);
 
  % Determine the common class
  cls = superiorfloat(a,b);
 
  % Create d vector; will type wise follow the a and b vectors
  d = randn(N,1,cls);
 
  % Compute the output
  r = b(1:end-1) .* exp(sin( d(1:end-1) ./ d(2:end) .* a(1:end-1) ./ a(2:end) )) + b(2:end);
  y = sum(r(:));
end


FOO4.m

function [ y ] = foo4( a, b, N )
  % foo1 Computes the output r for both MATLAB and Jacket variables.
  %
  % max(N) < length(a)-1
 
 
  % Determine the common class
  cls = superiorfloat(a,b);
 
  % Define scalar '0' and '1' used in colon expansion
  one = ones(cls);
 
  % Create d vector; will type wise follow the a and b vectors
  d = randn(length(a),1,cls);
 
  % Compute the output
  p1 = one : N-1;
  p2 = 2*one : N;
 
  r = b(p1) .* exp(sin( d(p1) ./ d(p2) .* a(p1) ./ a(p2) )) + b(p2);
  y = sum(r(:));
end


FOO5.m

function [ y ] = foo5( a, b, N )
  % foo1 Computes the output r for both MATLAB and Jacket variables.
  %
  % max(N) < length(a)-1
 
 
  % Determine the common class
  cls = superiorfloat(a,b);
 
  % Create d vector; will type wise follow the a and b vectors
  d = randn(length(a),1,cls);
 
  % Compute the output
  X = length(a) - N + 1;
 
  r = b(1:end-X-1) .* exp(sin( d(1:end-X-1) ./ d(2:end-X) .* a(1:end-X-1) ./ a(2:end-X) )) + b(2:end-X);
  y = sum(r(:));
end


FOO Overview

All functions generate the random matrix based on the class of the input vector. The differences are:

  • FOO1.m: Indexing done like a(1:length(a)-1).
  • FOO2.m: Indexing done like a(p1) where the class of p1 is single for MATLAB input variables and gsingle for Jacket input variables.
  • FOO3.m: Indexing done like a(1:end-1) - no class derived indexing.
  • FOO4.m: Indexing done similar to FOO2.m but here the highest index n used in p1:1:n can be any value, which don't exceed the vector length.
  • FOO5.m: As in FOO4.m this function allows indexing with n used in X=length(a)-n; p1:1:end-X;, which can be any value not exceeding the vector length.

So the former three functions utilize the full vector length according to the equation appearing in all functions; the latter two are more flexible in the sense that they can be used for only parts of the input vectors.


MASTER_FOO.m Script

To control the test of the two functions foo1.m, foo2.m, foo3.m, foo4.m, and foo5.m we will make the following master_foo.m script:

% Script file to test the foo.m function. The objective is to show the
% difference in performance depending on how indexing of arrays is done.
% A computational function is used to illustrate the concepts.
 
% Vector length
SZ = 1E7;
 
% Number of repetitions for MATLAB and Jacket, respectively
RPTM = 3;
RPTJ = 100;
 
 
%% FOO1 - SINGLE - FOO1
a=randn(SZ,1,'single'); b=randn(SZ,1,'single');
t1=tic;
for ii=1:RPTM
  y1_single_matlab=foo1(a,b);
end;
t1_single_matlab=toc(t1)/RPTM;
 
a=grandn(SZ,1,'single'); b=grandn(SZ,1,'single'); geval(a,b);
gsync;
t1=tic;
for ii=1:RPTJ
  y1_single_jacket=foo1(a,b);
  geval(y1_single_jacket);
end;
gsync;
t1_single_jacket=toc(t1)/RPTJ;
Speedup1_single = t1_single_matlab / t1_single_jacket;
 
 
%% FOO2 - SINGLE - FOO2
a=randn(SZ,1,'single'); b=randn(SZ,1,'single');
t1=tic;
for ii=1:RPTM
  y2_single_matlab=foo2(a,b);
end;
t2_single_matlab=toc(t1)/RPTM;
 
a=grandn(SZ,1,'single'); b=grandn(SZ,1,'single'); geval(a,b);
gsync;
t1=tic;
for ii=1:RPTJ
  y2_single_jacket=foo2(a,b);
  geval(y2_single_jacket);
end;
gsync;
t2_single_jacket=toc(t1)/RPTJ;
Speedup2_single = t2_single_matlab / t2_single_jacket;
 
 
%% FOO3 - SINGLE - FOO3
a=randn(SZ,1,'single'); b=randn(SZ,1,'single');
t1=tic;
for ii=1:RPTM
  y3_single_matlab=foo3(a,b);
end;
t3_single_matlab=toc(t1)/RPTM;
 
a=grandn(SZ,1,'single'); b=grandn(SZ,1,'single'); geval(a,b);
gsync;
t1=tic;
for ii=1:5*RPTJ
  y3_single_jacket=foo3(a,b);
  geval(y3_single_jacket);
end;
gsync;
t3_single_jacket=toc(t1)/(5*RPTJ);
Speedup3_single = t3_single_matlab / t3_single_jacket;
 
 
%% FOO4 - SINGLE - FOO4
a=randn(SZ,1,'single'); b=randn(SZ,1,'single');
t1=tic;
for ii=1:RPTM
  y4_single_matlab=foo4(a,b,SZ-1);
end;
t4_single_matlab=toc(t1)/RPTM;
 
a=grandn(SZ,1,'single'); b=grandn(SZ,1,'single'); geval(a,b);
gsync;
t1=tic;
for ii=1:RPTJ
  y4_single_jacket=foo4(a,b,SZ-1);
  geval(y4_single_jacket);
end;
gsync;
t4_single_jacket=toc(t1)/RPTJ;
Speedup4_single = t4_single_matlab / t4_single_jacket;
 
 
%% FOO5 - SINGLE - FOO5
a=randn(SZ,1,'single'); b=randn(SZ,1,'single');
t1=tic;
for ii=1:RPTM
  y5_single_matlab=foo5(a,b,SZ-1);
end;
t5_single_matlab=toc(t1)/RPTM;
 
a=grandn(SZ,1,'single'); b=grandn(SZ,1,'single'); geval(a,b);
gsync;
t1=tic;
for ii=1:RPTJ
  y5_single_jacket=foo5(a,b,SZ-1);
  geval(y5_single_jacket);
end;
gsync;
t5_single_jacket=toc(t1)/RPTJ;
Speedup5_single = t5_single_matlab / t5_single_jacket;
 
 
%% RESULTS
fprintf('===================================\n');
fprintf('SINGLE                       SINGLE\n');
fprintf('FOO1 - MATLAB [s]:       %10.6f\n', t1_single_matlab);
fprintf('     - JACKET [s]:       %10.6f\n', t1_single_jacket);
fprintf('     - Speedup [-]:      %10.6f\n', Speedup1_single);
fprintf('-----------------------------------\n');
fprintf('FOO2 - MATLAB [s]:       %10.6f\n', t2_single_matlab);
fprintf('     - JACKET [s]:       %10.6f\n', t2_single_jacket);
fprintf('     - Speedup [-]:      %10.6f\n', Speedup2_single);
fprintf('-----------------------------------\n');
fprintf('FOO3 - MATLAB [s]:       %10.6f\n', t3_single_matlab);
fprintf('     - JACKET [s]:       %10.6f\n', t3_single_jacket);
fprintf('     - Speedup [-]:      %10.6f\n', Speedup3_single);
fprintf('-----------------------------------\n');
fprintf('FOO4 - MATLAB [s]:       %10.6f\n', t4_single_matlab);
fprintf('     - JACKET [s]:       %10.6f\n', t4_single_jacket);
fprintf('     - Speedup [-]:      %10.6f\n', Speedup4_single);
fprintf('-----------------------------------\n');
fprintf('FOO5 - MATLAB [s]:       %10.6f\n', t5_single_matlab);
fprintf('     - JACKET [s]:       %10.6f\n', t5_single_jacket);
fprintf('     - Speedup [-]:      %10.6f\n', Speedup5_single);
fprintf('===================================\n');

The master_foo.m script test only for single and gsingle input vectors - the code can easily be expanded to double precision should that be interesting. The script file includes repetition loops to ensure that we measure for sufficiently long to obtain reproducible results.


Results

The platform used is an Intel Xeon X5570 CPU with 48 GB memory and an NVIDIA Tesla C2070 GPU. The specific execution times are not all that critical for this example and therefore other details are omitted - Jacket 1.7 is used though. The results obtained by using the master_foo.m script above (with 10E6 vector elements) is:

>> master_foo
===================================
SINGLE                       SINGLE
FOO1 - MATLAB [s]:         0.686132
     - JACKET [s]:         0.854018
     - Speedup [-]:        0.803416
-----------------------------------
FOO2 - MATLAB [s]:         0.892546
     - JACKET [s]:         0.024704
     - Speedup [-]:       36.129526
-----------------------------------
FOO3 - MATLAB [s]:         0.488589
     - JACKET [s]:         0.007817
     - Speedup [-]:       62.507196
-----------------------------------
FOO4 - MATLAB [s]:         0.884938
     - JACKET [s]:         0.024538
     - Speedup [-]:       36.063305
-----------------------------------
FOO5 - MATLAB [s]:         0.685769
     - JACKET [s]:         0.007836
     - Speedup [-]:       87.519500
===================================
>>

We can't directly compare the output results from foo1.m and foo2.m as the random function is not identical in the MATLAB and Jacket cases. To test for validity, a deterministic d vector has been used though to ensure that identical results are obtained from the functions foo1.m and foo2.m.


Conclusion

As shown above the indexing is very important and there is one lesson to learn. Whenever possible relate the indexing to end - and if you need to only access part of a vector, say the first n elements then do "X=length(a)-n; ... a(1:end-X)". This procedure maintains almost full speed. The same method should be applied if the place to start in the vector varies; as in "X=length(a)/2; ... a(X+1:end)=a(1:end-X)" - I have performed the tests but not included the files (I believe there are plenty already). The conclusion for MATLAB is basically the same - although here the indexing when only using parts of a vector is not as fast as when using the entire vector.



Go Home: Torben's Corner


Views
Personal tools