Code Vectorization Examples
From Jacket Wiki
Back to Documentation
Example 1 (multiple circshifts all in one go)
This vectorization example isn't quite tested, but should work to do multiple circshifts all at once incredibly fast.
Original code is located at http://forums.accelereyes.com/forums/viewtopic.php?f=11&t=1244#p3408
The new, vectorized code is located at http://forums.accelereyes.com/forums/viewtopic.php?f=11&t=1244#p3410
Example 2 (distance matrix all in one go)
The original and vectorized code is located at http://blog.accelereyes.com/blog/2010/04/05/converting-matlab-loops-to-gpu-code/
Example 3 (row rearrangement all in one go)
Nothing very special here, really. Original code at the top, new at the bottom. Should work as a standalone script. Much much faster.
R = grand(1000); parent = randperm(1000)'; child = randperm(1000)'; geval(R); gsync(); tic C_old = gzeros(size(R)); for s = 1:size(R, 1) C_old(s, :) = R(parent(s), :) ~= R(child(s), :); end geval(C_old); gsync(); toc
New code is,
R = grand(1000); parent = randperm(1000)'; child = randperm(1000)'; geval(R); gsync(); tic C_new = gzeros(size(R)); C_new = R(parent(1:size(R,1)),:) ~= R(child(1:size(R,1)),:); geval(C_new); gsync(); toc
Example 4 (wavelet generation all in one go)
This can't really be run as a script as there is alot surrounding this. Might go back and make this a standalone script that you can copy/paste into MATLAB, later.
Original code,
tic % this is the baseline for theta = 0:pi/4:3*pi/4 for i = 1:size(k, 1) if isa(W, 'gsingle') pos = gsingle(k(i,:)); sigma = gsingle(sigma); theta = gsingle(theta); pitch_factor = gsingle(pitch_factor); end % parameters of kernel gamma = 1; lambda = 0.8 / 2; psi = 0; sigma_ = sqrt(sigma) / 10; rng = -W:W; % coordinate system [X, Y] = meshgrid(rng / W * 2.5, rng / W * 2.5); X = X / pitch_factor - pos(2); Y = Y / pitch_factor - pos(1); X_ = X * cos(theta) + Y * sin(theta); Y_ = -X * sin(theta) + Y * cos(theta); envelope = exp( -(X_.^2 + gamma^2 * Y_.^2) / (2 * sigma_^2) ); wave = cos(2 * pi * X_ / lambda + psi); tmp = envelope .* wave; % remains constant K = cat(3, K, tmp); end end geval(K_gold); gsync(); toc
New code (much much faster)
tic if isa(W, 'gsingle') pos = gsingle(k); sigma = gsingle(sigma); pitch_factor = gsingle(pitch_factor); end % parameters of kernel gamma = 1; lambda = 0.8 / 2; psi = 0; sigma_ = sqrt(sigma) / 10; rng = gsingle(-W:W); thetas = 0:pi/4:3*pi/4; thetas_n = numel(thetas); pos_n = size(k, 1); % base coordinate system [X, Y] = meshgrid((rng / W * 2.5) / pitch_factor, (rng / W * 2.5) / pitch_factor); % shift X coordinate system for each wavelet position x_pos_3d = reshape(pos(:,2), 1, 1, pos_n); POS = repmat(x_pos_3d, [1, 1, thetas_n]); X = bsxfun(@minus, X, POS); % shift Y coordinate system for each wavelet position y_pos_3d = reshape(pos(:,1), 1, 1, pos_n); POS = repmat(y_pos_3d, [1, 1, thetas_n]); Y = bsxfun(@minus, Y, POS); % rotate X and Y coordinate systems thetas_3d = bsxfun(@plus, thetas, zeros(pos_n, 1)); thetas_3d = reshape(thetas_3d, 1, 1, thetas_n * pos_n); cos_theta = cos(thetas_3d); sin_theta = sin(thetas_3d); X_ = bsxfun(@times, X, cos_theta) + ... bsxfun(@times, Y, sin_theta); Y_ = bsxfun(@times, X, -sin_theta) + ... bsxfun(@times, Y, cos_theta); geval(X_); geval(Y_); envelope = exp( -(X_.^2 + gamma^2 * Y_.^2) / (2 * sigma_^2) ); wave = cos(2 * pi * X_ / lambda + psi); tmp = envelope .* wave; geval(tmp); gsync(); toc
permute > reshape in some instances
PERMUTE is the same as RESHAPE and is shorter to write than reshape in several cases. Many of these cases involve bsxfun because often we wish to extend computation utilizing an extra dimension which often turns out to be the 3rd dimension in the case of 2D matrices.
A = 1:20; C = permute(A, [1 3 2]); D = reshape(A, [1 1 numel(A)]); assert(all(C(:) == D(:)));
Batched corr2 over sliding windows (im2col used)
http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1326#p3804
Avoid colon+subsref using meshgrid and logicals
Sometimes a subsref is slow depending on the use case. It turns out that you can do almost the same thing as a subsref using meshgrid, logicals, and masking.
http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1331#p3827
http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1334&start=10#p3860
vectorizing gfor code using bsxfun
http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1352&p=3944#p3944
batched corr2 lots of vectors
http://forums.accelereyes.com/forums/viewtopic.php?f=7&t=1355&p=3964#p3964
using find to vectorize changing size variable in for loop
Here, we simulate an accumarray operation (not yet supported by Jacket) via the population and summation of a sparsely populated zeros matrix. The following code was embedded in a much larger script and is thus not runnable, but hopefully the idea is conveyed in the following example. The loop version is as follows,
% loop version for p = 1:N Pair = double(Mask_index(p, :)); Mask_positions = [ 1; Compare_seq(:, Pair(1)) | Compare_seq(:, Pair(2))]; Chosen = find(Mask_positions); Properties_random = Properties_factor_alignment1_r(Chosen, Pair); Temp = Properties_random; Cov_matrix = Temp'*Temp; % final result, 2x2 matrix end
Notice that the difficulty, here, is the use of find. It's result, "Chosen", can vary in size per iteration of the loop.
% vectorized version Pair = double(Mask_index(1:N, :)); Mask_positions = [ gones(1, N); (Compare_seq(:, Pair(:,1)) | Compare_seq(:, Pair(:,2)))] ; [Chosen loop_num] = find(Mask_positions); max_len = max(Chosen); storage_pair1 = gzeros(double(max_len), N); storage_pair2 = gzeros(double(max_len), N); ii = Chosen + (loop_num - 1) * max_len; ii_ = Chosen + (Pair(loop_num,1) - 1) * size(Properties_factor_alignment1_r, 1); storage_pair1(ii) = Properties_factor_alignment1_r(ii_); ii_ = Chosen + (Pair(loop_num,2) - 1) * size(Properties_factor_alignment1_r, 1); storage_pair2(ii) = Properties_factor_alignment1_r(ii_); Cov_matrix1 = sum(storage_pair1 .* storage_pair1); Cov_matrix2 = sum(storage_pair1 .* storage_pair2); Cov_matrix3 = sum(storage_pair2 .* storage_pair1); Cov_matrix4 = sum(storage_pair2 .* storage_pair2);
masking
Initial code to save the non-zeros from the input:
function out = savenonzeros(in) out = zeros(size(in), class(in)); out(in ~= 0) = in(in ~= 0);
Avoid repeating the in~= computation, and do one FIND:
function out = savenonzeros(in) out = zeros(size(in), class(in)); nz = find(in); out(nz) = in(nz);
Use masking to express it entirely as arithmetic:
function out = savenonzeros(in) out = (in ~= 0) .* in;