Profiling Jacket Code
From Jacket Wiki
A variety of techniques and tools exist to help with timing your Jacket code. This page will tell you what tool to use for a particular situation.
Contents |
Tools
Once you have gotten past the basics, you are bound to ask yourself the question: how do I find out how fast my code runs?
The following tools help in different ways to answer that question.
- MATLAB Profiler: MATLAB's profiler is a good tool for early-stage analysis of your code. When run on CPU code, it can be used to identify the slowest chunk in a project, which is what you should target for converting to Jacket.
When run on GPU code, it can give you a rough idea of code speeds. For more accurate speedup figures, use GPROFILE.
- TIMEIT: After you have identified the chunk of MATLAB code to convert to Jacket, you can use timeit to quickly time single lines or single functions. timeit gives you the overall run-time, and runs on both CPU and GPU code.
- GPROFILE: Use GPROFILE during the intermediate stage of writing Jacket code. You can exploit GPROFILE's results to make choices on the type of Jacket function to use.
GPROFILE, the Jacket profiler, differs from MATLAB's profiler in that it has hooks into Jacket's source code, which MATLAB does not have. As a result, GPROFILE can give you much more accurate timings.
Also, GPROFILE is meant to be run on Jacket code. It automatically converts this code to CPU code and runs that, to give you an idea of the speedups or slowdowns achieved.
Example
As an example, we will take the Chan-Vese Active Contours discussed in this blog post.
Run MATLAB Profile
At the outset, we ran the CPU code on MATLAB's profiler (the resulting profile is in the blog post). We found that the function kappa takes up most of the time, and therefore, that was our target for conversion to Jacket.
Use timeit on slowest chunk
We formulated a set of objectives specific to the project, and then began Jacketizing the code. Since our target was a single function and we did not want to be distracted with anything else, we extracted the function from the main project, and ran it with a set of random matrices as inputs.
Following each Jacketization attempt, we ran timeit(@() kappa(CPU1, CPU2) and timeit(@() kappa(GPU1, GPU2) to get an idea of speedups. We iterated this step until we were satisfied.
Run GPROFILE
To further analyze the code behind the scenes, we used GPROFILE. We wanted to know, specifically, the impact of reducing the amount of SUBSREF calls on execution times.
GPROFILE can give you the time taken on individual operations involved in a function (subsref,min,mtimes etc) and the number of calls to each operation. We found that reducing the number of calls to subsref reduced the execution time.