Tips
From Jacket Wiki
Back to Documentation
In order to get the best performance from your code, it is helpful to keep in mind the following tips for optimizing your performance:
- Vectorized Code: Both MATLAB and Jacket perform best with vectorized code because the computations map naturally onto arithmetic cores of the CPU and GPU. For a good guide to writing vectorized MATLAB code, see the following resources: Code Vectorization Examples, MATLAB Technical Note, Good MATLAB Coding Practices.
- Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the GPU pushes or pulls data back and forth from CPU memory to GPU memory. The Jacket software minimizes these memory transfers automatically in normal operation, but excessive casts between CPU and GPU memory may reduce the performance.
- Serial vs Parallel Operations: Remember, CPUs are serial computing devices and GPUs are parallel computing devices. For small or serial operations, the best performance is likely achieved on the CPU. For large or parallel operations, the best performance is likely achieved on the GPU. Often, a good rule of thumb is to use the CPU if your data is only a few hundred elements. You can control each which segments of code are run on each device through the casting operations, described on the Jacket Basics page.
- Hybrid vs Purebred Computations: Computations involving pure GPU data (e.g. a GDOUBLE times a GDOUBLE) tend to be faster than computations involving a hybrid of GPU and CPU data (e.g. a GSINGLE times a DOUBLE).
- Don't cast everything to GPU: To qualify the point above, you don't need to cast every variable, large or small, to GPU type. Jacket's internal workings automatically pass smaller parameters to the GPU. You only need to worry about larger inputs.
- Loops: Note that GFOR has many restrictions, described on the GFOR page. In general, use GFOR loops only if all the restrictions are met. Since there are many restrictions, most loops will likely be better off using GPU variables inside the body of regular FOR loops to simply achieve "per iteration" speedups. Finally, note that for both MATLAB and Jacket code, it is better to vectorize as much as possible to avoid FOR-loops and GFOR-loops.
- Lazy Execution: Jacket employs a lazy execution design to provide optimal performance for your application. Lazy execution means that Jacket does not launch GPU kernels until the results are requested, either in a display or subsequent CPU-based computation. There are exceptions to this rule to allow for optimal kernel configurations (e.g. Jacket does not allow kernels to get excessively big or they would not be able to run on the GPU). If you wish to force a GPU computation, the Jacket GEVAL and GSYNC (formerly GFORCE) functions are available (see Jacket by Example for more details).
- Warming Up Computations: As NVIDIA points out in their examples, it is often beneficial to "warm up" a computation to get maximum performance. In other words, the first run of code in a freshly opened MATLAB will typically be slower than subsequent runs of the exact same code. This first run is a little slower because Jacket is spending time caching your program and pushing your data to the GPU so subsequent runs will be faster.
- Regular Access Patterns: When performing subscripting, keep in mind that the GPU memory controllers are not as versatile as those on the CPU. Best performance is achieved when your subscript access patterns are regular and uniform. For example, A([1 4 2 3 5 1 2 ...]) would be slow while A([1 2 3 4 5 6 ...]) would be faster. MATLAB and Jacket are both column-major, so it is faster to access columns (A(:,i)) rather than rows (A(i,:)).
- Maintaining your codebase: It's always beneficial to keep your Jacket and MATLAB code-base as simple as possible to make code maintenance easier. Check out the following features of Jacket designed to make writing and benchmarking your code simpler: