Maintaining CPU and GPU code in 1 file
From Jacket Wiki
|
In most cases we want our code to be usable for both MATLAB and for Jacket. And keeping track of one set of functions and scripts for Jacket and another for MATLAB is a nightmare and a safe way to disaster at some point. Fortunately, Jacket 1.7.x has improved significantly on this issue. Not everything has been solved (gfor and grandn/grand still need to be handled in a special way). Typically, we have for example single/double in MATLAB and gsingle/gdouble in Jacket. I will give a few examples of how this can be done efficiently and (more or less) elegantly. The main objective is to make computationally efficient code, which can run with both MATLAB and Jacket variables. The basic core concepts using CLASS and ISA are described in a blog post. On this page, I provide more examples.
Problem
Suppose we need a function to do the following computations where an element in a response vector is given by:
with
, and:
Code
In this case the input to the function is k,
, and
. We would like the function to work immediately with MATLAB as well as Jacket. For this we will take advantage of the class inheritance property in Jacket 1.7.
function [ r ] = foo( a, k, c ) % foo Computes the output r for both MATLAB and Jacket variables. % Determine the common class cls = class(a); % Define scalar '0' used in colon expansion zero = zeros(cls); % Create random vector b; use grandn if a is a garray type, and randn otherwise b = randn(length(a),1,cls); % Create d vector; will type wise follow the a and b vectors d = (zero:length(a)-1).'; % Compute the output; no need to multiply c with ones(N,1) as a scalar here is automatically handled r = a .* exp(sin( k * b .* d )) + c; end
There are a few things to notice here:
- Normally, the type of arrays
is not very easy to handle - of course we could just keep it as a MATLAB variable and let Jacket do as it prefers. It is my experience, however, that performance can sometimes be improved significantly by creating the vector on the GPU for Jacket computations. With the new possibilities in Jacket 1.7.x we now can handle this type of challenge. Observe that zero is defined as a gsingle/gdouble when a is a Jacket variable. By this we enforce d to be of the same type as a. This is done by defining zero as a scalar where the type follows cls (class of the input vectors), and we can :-expand the vector
.
- We use superiorfloat to transfer the class from the input vectors to
. Also we issue an error in case the input is not a single / gsingle / double / gdouble.
- The random vector
is generated by grandn if the input vector a is a garray type and randn otherwise. The class is directly transferred to grandn.
- scalars are in my experience best handled by letting Jacket do whatever it wants. See http://wiki.accelereyes.com/wiki/index.php/Handling_Scalars_In_Jacket for more info on this. But you can of course try with different types of input and check the performance.
A master file has been created to The master file to run the code named master_foo.m is given by:
% Script file to test the foo.m function. The objective is to show how one % function file by use of class inheritance can be used for both MATLAB and % Jacket. A computational function is used to illustrate the concepts. %% SINGLE a=randn(1E7,1,'single'); k=single(0.1); c=single(2); t1=tic; for ii=1:5 x_single_matlab=foo(a,k,c); end; t_single_matlab=toc(t1)/5 a=grandn(1E7,1,'single'); k=gsingle(0.1); c=gsingle(2); geval(a,k,c); gsync; t1=tic; for ii=1:20 x_single_jacket=foo(a,k,c); geval(x_single_jacket); end; gsync; t_single_jacket=toc(t1)/20 Speedup_single = t_single_matlab / t_single_jacket %% DOUBLE gver = gpu_entry(13); if gver.compute > 1.2 a=randn(1E7,1,'double'); k=double(0.1); c=double(2); t1=tic; for ii=1:5 x_double_matlab=foo(a,k,c); end; t_double_matlab=toc(t1)/5 a=grandn(1E7,1,'double'); k=gdouble(0.1); c=gdouble(2); geval(a,k,c); gsync; t1=tic; for ii=1:20 x_double_jacket=foo(a,k,c); geval(x_double_jacket); end; gsync; t_double_jacket=toc(t1)/20 Speedup_double = t_double_matlab / t_double_jacket end %% TYPES if gver.compute > 1.2 whos x_single_matlab x_single_jacket x_double_matlab x_double_jacket else whos x_single_matlab x_single_jacket end
The master_foo.m lists the execution time for single/gsingle as well as for double/gdouble if gdouble support exist. Also the speedup and types of the foo.m function output variables are shown.
Results
The platform used is an Intel Xeon X5570 CPU with 48 GB memory and an NVIDIA Tesla C2070 GPU. The specific execution times are not all that critical for this example and therefore other details are omitted - Jacket 1.7 is used though. First, let's use MATLAB to perform the computations. To ensure we measure over sufficient time we use a repetition loop as seen from the master_foo.m code:
>> master_foo
t_single_matlab =
0.2471
t_single_jacket =
0.0107
Speedup_single =
23.1687
t_double_matlab =
0.2732
t_double_jacket =
0.0096
Speedup_double =
28.5032
Name Size Bytes Class Attributes
x_double_jacket 10000000x1 824 gsingle
x_double_matlab 10000000x1 80000000 double
x_single_jacket 10000000x1 824 gsingle
x_single_matlab 10000000x1 40000000 single
>>The master_foo.m script was run twice before the above result to ensure warm-up has been completed. It has also been tried to let k and c be various combinations - the only thing to avoid is to mix types. Meaning that having one gsingle/gdouble and the other single/double should be avoided in this case. It is always a good idea to test for different combinations to avoid unpleasant surprises.
The speedup for this CPU/GPU combination is around 23 in single precision and 28 in double precision. As seen from the whos command the type of the signals are not quite as expected - the behavior we should see for gdouble has been recognized by the Jacket team as a bug and is under investigation.
Go Home: Torben's Corner