GPROFILE
From Jacket Wiki
Back to Jacket Function List, Jacket Basics
Profile Jacket and MATLAB code
Supported Syntax
gprofile [on/off] gprofile report [command name] [file] [line number] S = gprofile
GFOR Compatibility?
Yes. Try placing a gprofile on/off around the inner parts of of a GFOR loop - the results should be interesting!
Description
GPROFILE facilitates GPU and CPU profiling from the MATLAB console. GPROFILE records and reports on the time spent in MATLAB functions on the GPU and compares those results against comparable CPU timings to indicate sections of code that should or shouldn't be run on the GPU or those that could benefit from vectorization or refactoring. GPROFILE also implicitly checks the results of all GPU function outputs with their CPU counterparts thereby facilitating a quick search for precision errors or outright GPU failures if necessary. These comparisons additionally provide a sanity check in case programs are seemingly going awry.
Syntax Descriptions
- gprofile [on/off]: GPROFILE given the on option will turn on the GPU profiling mechanism and hooks and will clear all previous GPROFILE state. GPROFILE off will simply turn off all profiling hooks. Note that the current version of GPROFILE incurs significant overhead typically on the order of 2x to 3x the nominal program runtime.
- gprofile report: provides a command by command summary of GPU runtimes, CPU runtimes, and comparisons. When called from desktop-mode MATLAB, the report is colored such that red lines indicate commands that on average are faster on the GPU whereas green lines are faster on the GPU. For ubiquity, the ASCII characters ^ and V provide similar indications when a color console is not available.
- gprofile report [command name]: similar to gprofile report and restricts output to a given command name. Timings are given for the given command on a per file and line basis for both the GPU and CPU wih comparisons. Each item in the report includes a line of the original source code at which the command was called.
- gprofile report [command name] [file]: similar to gprofile report [command name] with information restricted to the given command inside the given file. Timings are given for the given command on a use case basis for both the GPU and CPU with comparisons . I.E. each line of the report gives information detailing call to the command on a different size and dimensionality of data. This report shows how the GPU is advantages on certain sizes of data as well as different access patterns (e.g. sum on columns SUM(A,1) versus sum on rows SUM(A,2)).
- gprofile report [command name] [file] [line number]: similar to gprofile report [command name] [file] with information restricted to the given command inside the given file. Timings are given for the given command on a use case basis for both the GPU and CPU with comparisons and are restricted to a given line. This is useful in the case a command is called on many different sizes of data at the same place in a program and an understanding of the common sizes of input data is required (e.g. sum(find(A > thresh))).
- S = gprofile: returns a struct S of all data recorded by GPROFILE thus far. (More documentation coming as this develops).
Examples
% corr2 between 25^2 kernel and 25^2 by 5e4 signal (sliding window) on GPU gprofile on A = grand(25 * 25, 5e4); C = grand(25); C_ = mean(C(:)); Cv = C(:)' - C_; c = dot(Cv, Cv); A_ = mean(A); Av = bsxfun(@minus, A, A_); gpu = (Cv * Av) ./ sqrt(sum(Av .* Av) * c); gprofile off gprofile report |
Standard result of GPROFILE using the gprofile report syntax. Here, we see that multiplication and random number generation as well as other commands are significantly faster than the CPU whereas SUM seems to be slower on average. (280M vs i950 8 core) |
gprofile on N = 2e3; for i=1e3:1e2:N A = grand(i); C = conv2(ones(3,1), ones(3,1), A, 'same'); fprintf('%d/%d\n', i, N); end gprofile report gprofile report conv2 gprofile report conv2 gprofile_example gprofile report conv2 gprofile_example 6 |
GPROFILE report (gprofile report conv2 gprofile_example) examining CONV2 performance. Each different use case that GPROFILE reported for CONV2 (different size inputs) is shown. Notice how for particular sizes performance peaks (1600x1600 and 2000x2000). With this information in hand, one may tune code to hit just the right data sizes to maximize GPU throughput. |