Code Vectorization Examples

From Jacket Wiki

Jump to: navigation, search
Back to Documentation

Contents

Example 1 (multiple circshifts all in one go)

This vectorization example isn't quite tested, but should work to do multiple circshifts all at once incredibly fast.

Original code is located at http://forums.accelereyes.com/forums/viewtopic.php?f=11&t=1244#p3408

The new, vectorized code is located at http://forums.accelereyes.com/forums/viewtopic.php?f=11&t=1244#p3410

Example 2 (distance matrix all in one go)

The original and vectorized code is located at http://blog.accelereyes.com/blog/2010/04/05/converting-matlab-loops-to-gpu-code/

Example 3 (row rearrangement all in one go)

Nothing very special here, really. Original code at the top, new at the bottom. Should work as a standalone script. Much much faster.

R = grand(1000); parent = randperm(1000)'; child = randperm(1000)';
geval(R); gsync();
tic
C_old = gzeros(size(R));
for s = 1:size(R, 1)
  C_old(s, :) = R(parent(s), :) ~= R(child(s), :);
end
geval(C_old); gsync();
toc

New code is,

R = grand(1000); parent = randperm(1000)'; child = randperm(1000)';
geval(R); gsync();
tic
C_new = gzeros(size(R));
C_new = R(parent(1:size(R,1)),:) ~= R(child(1:size(R,1)),:);
geval(C_new); gsync();
toc

Example 4 (wavelet generation all in one go)

This can't really be run as a script as there is alot surrounding this. Might go back and make this a standalone script that you can copy/paste into MATLAB, later.

Original code,

tic
% this is the baseline
for theta = 0:pi/4:3*pi/4
  for i = 1:size(k, 1)
    if isa(W, 'gsingle')
      pos = gsingle(k(i,:)); sigma = gsingle(sigma); theta = gsingle(theta);
      pitch_factor = gsingle(pitch_factor);
    end
 
    % parameters of kernel
    gamma = 1;
    lambda = 0.8 / 2;
    psi = 0;
    sigma_ = sqrt(sigma) / 10;
    rng = -W:W;
 
    % coordinate system
    [X, Y] = meshgrid(rng / W * 2.5, rng / W * 2.5);
    X = X / pitch_factor - pos(2); Y = Y / pitch_factor - pos(1);
    X_ =  X * cos(theta) + Y * sin(theta);
    Y_ = -X * sin(theta) + Y * cos(theta);
 
    envelope = exp( -(X_.^2 + gamma^2 * Y_.^2) / (2 * sigma_^2) );
    wave = cos(2 * pi * X_ / lambda + psi);
    tmp = envelope .* wave;
 
    % remains constant
    K = cat(3, K, tmp);
  end
end
geval(K_gold); gsync();
toc

New code (much much faster)

tic
if isa(W, 'gsingle')
  pos = gsingle(k); sigma = gsingle(sigma);
  pitch_factor = gsingle(pitch_factor);
end
 
% parameters of kernel
gamma = 1;
lambda = 0.8 / 2;
psi = 0;
sigma_ = sqrt(sigma) / 10;
rng = gsingle(-W:W);
thetas = 0:pi/4:3*pi/4;
thetas_n = numel(thetas);
pos_n = size(k, 1);
 
% base coordinate system
[X, Y] = meshgrid((rng / W * 2.5) / pitch_factor, (rng / W * 2.5) / pitch_factor);
 
% shift X coordinate system for each wavelet position 
x_pos_3d = reshape(pos(:,2), 1, 1, pos_n);
POS = repmat(x_pos_3d, [1, 1, thetas_n]);
X = bsxfun(@minus, X, POS);
 
% shift Y coordinate system for each wavelet position
y_pos_3d = reshape(pos(:,1), 1, 1, pos_n);
POS = repmat(y_pos_3d, [1, 1, thetas_n]);
Y = bsxfun(@minus, Y, POS);
 
% rotate X and Y coordinate systems
thetas_3d = bsxfun(@plus, thetas, zeros(pos_n, 1));
thetas_3d = reshape(thetas_3d, 1, 1, thetas_n * pos_n);
cos_theta = cos(thetas_3d);
sin_theta = sin(thetas_3d);
X_ = bsxfun(@times,  X,  cos_theta) + ...
     bsxfun(@times,  Y,  sin_theta);
Y_ = bsxfun(@times,  X, -sin_theta) + ...
     bsxfun(@times,  Y,  cos_theta);
 
geval(X_); geval(Y_);
envelope = exp( -(X_.^2 + gamma^2 * Y_.^2) / (2 * sigma_^2) );
wave = cos(2 * pi * X_ / lambda + psi);
tmp = envelope .* wave;
 
geval(tmp); gsync();
toc

permute > reshape in some instances

PERMUTE is the same as RESHAPE and is shorter to write than reshape in several cases. Many of these cases involve bsxfun because often we wish to extend computation utilizing an extra dimension which often turns out to be the 3rd dimension in the case of 2D matrices.

A = 1:20;
C = permute(A, [1 3 2]);
D = reshape(A, [1 1 numel(A)]);
assert(all(C(:) == D(:)));

Batched corr2 over sliding windows (im2col used)

http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1326#p3804

Avoid colon+subsref using meshgrid and logicals

Sometimes a subsref is slow depending on the use case. It turns out that you can do almost the same thing as a subsref using meshgrid, logicals, and masking.

http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1331#p3827

http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1334&start=10#p3860

vectorizing gfor code using bsxfun

http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1352&p=3944#p3944

batched corr2 lots of vectors

http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1355&p=3964#p3964

using find to vectorize changing size variable in for loop

Here, we simulate an accumarray operation (not yet supported by Jacket) via the population and summation of a sparsely populated zeros matrix. The following code was embedded in a much larger script and is thus not runnable, but hopefully the idea is conveyed in the following example. The loop version is as follows,

% loop version
for p = 1:N
  Pair = double(Mask_index(p, :));
  Mask_positions = [ 1; Compare_seq(:, Pair(1)) | Compare_seq(:, Pair(2))];
  Chosen = find(Mask_positions);
 
  Properties_random = Properties_factor_alignment1_r(Chosen, Pair);
  Temp = Properties_random;
  Cov_matrix = Temp'*Temp;   % final result, 2x2 matrix
end

Notice that the difficulty, here, is the use of find. It's result, "Chosen", can vary in size per iteration of the loop.

% vectorized version
Pair = double(Mask_index(1:N, :));
Mask_positions = [ gones(1, N); (Compare_seq(:, Pair(:,1)) | Compare_seq(:, Pair(:,2)))] ;
[Chosen loop_num] = find(Mask_positions);
 
max_len = max(Chosen);
storage_pair1 = gzeros(double(max_len), N);
storage_pair2 = gzeros(double(max_len), N);
ii = Chosen + (loop_num - 1) * max_len;
ii_ = Chosen + (Pair(loop_num,1) - 1) * size(Properties_factor_alignment1_r, 1);
storage_pair1(ii) = Properties_factor_alignment1_r(ii_);
ii_ = Chosen + (Pair(loop_num,2) - 1) * size(Properties_factor_alignment1_r, 1);
storage_pair2(ii) = Properties_factor_alignment1_r(ii_);
 
Cov_matrix1 = sum(storage_pair1 .* storage_pair1);
Cov_matrix2 = sum(storage_pair1 .* storage_pair2);
Cov_matrix3 = sum(storage_pair2 .* storage_pair1);
Cov_matrix4 = sum(storage_pair2 .* storage_pair2);

masking

Initial code to save the non-zeros from the input:

 function out = savenonzeros(in)
   out = zeros(size(in), class(in));
   out(in ~= 0) = in(in ~= 0);

Avoid repeating the in~= computation, and do one FIND:

 function out = savenonzeros(in)
   out = zeros(size(in), class(in));
   nz = find(in);
   out(nz) = in(nz);

Use masking to express it entirely as arithmetic:

 function out = savenonzeros(in)
   out = (in ~= 0) .* in;
Personal tools