FAQ: Programming and Development

From Jacket Wiki

Jump to: navigation, search
Back to FAQ
Jacket for MATLAB®

Contents

Programming Basics

How much speedup can I expect with Jacket?

Many factors affect the speedups that you can achieve with Jacket. The following is a partial list of these factors, in no particular order:

  • NVIDIA Graphics Card. The more advanced the card, the more the speedup you get.
  • Data Sizes. In general, GPUs will outperform CPUs to a larger degree as data sizes increase. GPUs are only fast because they can exploit data parallelism. If there is not much data (e.g. only a few hundred elements in a vector), then there is not going to be much gain in performance. However, if there is a lot of data (e.g. more than 10,000 elements or a 100x100 matrix) then GPUs will be able to process those elements in parallel and exploit the data-parallelism.
  • Application. Speedup figures vary from application to application. Some operations may be computed faster than others on the GPU. In general, the more parallelizable, the better the GPU performance. And the fewer trips memory takes across the bus, the better.
  • CPU. In running speedup comparisons between the CPU and the GPU, the CPU speed matters. Better CPUs help Jacket go faster, because Jacket does a lot of things (e.g. JIT compilation) on the CPU to keep the GPU fully utilized.
  • MATLAB Code. Better-written MATLAB code always leads to better Jacket code.

More resources:

The tips in this link will ensure you get as much speedup as possible for your application.

Back to Top


How do I time my Jacket code?

Jacket offers different tools to analyze your code.

See this page to find out how to use different tools to profile your Jacket code in different ways.

Back to Top


What is GPU warm-up?

Warm-up of the GPU is something that affects many applications at start-up, and GPU computing is no exception. When the GPU is just starting up computation with Jacket, there are many things that need to be done, both on Jacket and NVIDIA's end. Some of these are:

  • Loading libraries
  • Initializing the GPU state

Also, certain Jacket commands are compiled "on-the-fly" (also called lazy execution), meaning that the device code is generated at the time that command is encountered. On the first run of these commands, Jacket caches the generated device code for later use, thereby saving time for repeated executions of the same command.

The implication of this mechanism is that the first run of your Jacket code (often on a fresh MATLAB session) might be slower than subsequent runs.

To avoid having to consider all this while timing Jacket code, use TIMEIT.

More resources:

Back to Top


Why is my Jacket code running slower than MATLAB CPU code?

NOTE: This check list is by no means comprehensive and is based upon the most common reasons for slow Jacket code.


  • Do you have a lot of unvectorized code? Jacket is heavily dependent upon vectorization. Operations on single variables or small arrays are not recommended if you are looking for good speedup figures.
  • Do you have code chunks inside a loop that are liable to be re-compiled everytime Jacket encounters them? For more information on how to detect if you have such code, check the following link on the AccelerEyes Blog: "Lazy Execution"
  • Every calculation does not have to be performed on the GPU. A good practice is to profile your CPU code to identify a computation-intensive chunk, and then attempt to convert that code to Jacket.
  • Check for excessive CPU-to-GPU memory transfers (GSINGLE to DOUBLE conversions, hybrid operations involving CPU and GPU variables)
  • Jacket functions themselves have a break-even point beyond which they start showing speedup. It is difficult to get an objective figure for each function as this depends on your machine configuration and the specific application, but suffice it to say that larger input sizes generally show more speedup.
  • Did you warm up the GPU? Make sure you run your Jacket code more than once, and preferably in a CPU for-loop to get a better idea of the average timing value.
  • It may be that your algorithm itself is heavily dependent upon operations that are not parallelizable. In this case, it is better to modify parts of the algorithm to make the code convertible to Jacket.

More resources:

Back to Top


Why does Jacket not support .... ?

MATLAB has thousands of functions. But why does Jacket not support all of them?

The answer is simple: the GPU is a computational engine, and is highly optimized for computational tasks. The CPU, on the other hand, performs a wide variety of generic tasks, and it is not optimized for computations.

Thus, for example, performing string-processing on the GPU is impractical.

We realize that in a perfect world, all (or most) MATLAB functions would be supported by Jacket. We are working toward this vision, and we started off by porting a set of functions which (we thought) would be most-used. We are continually working on adding new functions and extending the functionality of existing ones. If you feel that a particular function or use case for a function needs to be supported, do make a feature request on the AccelerEyes Forums. If you wish to find out if a function is supported, you can check out the Jacket Function List.

Back to Top

Fine-Tuning your Jacket code

I have a smaller card and need to run memory-heavy code

Trying to fit a large problem onto a smaller card is tricky, but with some tweaks you can certainly accomplish it.

Keep in mind, though, that some algorithms just will not fit on cards that cannot handle their memory footprint.

The major issue you need to contend with is Out-of-Memory errors. To avoid these errors, try the following tweaks:

  • If you have large matrices, try not to store temporary copies of them as far as possible.
  • If you do have to store those large temporary copies, try clearing them out (CLEAR varname) after use.
  • Sometimes it's possible to divide large algorithms into smaller chunks that can fit into the card's memory.

A useful tool to keep accurate track of all the data stored in memory is the command, GINFO(1,1). This displays more detailed information than GINFO.

If you do face problems, the AccelerEyes Forums or AccelerEyes Support are a great place to ask questions.

What profiling tools does Jacket provide?

Profiling and timing GPU code is slightly different from CPU timing, due to the various optimizations Jacket performs while executing your commands. NVIDIA's CUDA™ technology, also, has several optimizations built into it.

As a result, Jacket provides different tools, described in this link.

A Note about the MATLAB Profiler

MATLAB's Profiler, which can give timing figures for CPU code, also works with Jacket. But it does not have access to Jacket's source code, and therefore you may see some discrepancies when profiling Jacket GPU code with the MATLAB Profiler.

For example, you may see certain lines consume a lot of time where they shouldn't.

This behavior is due to Jacket's compilation behaviour and built-in optimizations, among other things. Jacket's GPROFILE can give you much better results.

More resources: AccelerEyes Forums thread discussing MATLAB Profiler

For information on the MATLAB profiler, please use MATLAB help.

Back to Top


What is Timeout Detection and Recovery and why should I care?

A commonly observed issue in older Windows versions is when the system freezes while performing Graphics-related operations (while playing games, for instance). Since at this point, the GPU is busy processing a complex graphical computation, it does not update anything on the display, usually causing the user to reboot the system. To avoid this problem, Windows Vista and later editions incorporate a mechanism for recovery from any such unresponsiveness of the graphics driver, called as Timeout Detection and Recovery (TDR). Essentially, this system resets the state of the GPU when it detects a complicated computation on the GPU that's taking longer than a pre-specified period of time (The preset time interval in seconds is set using a registry key, 'TdrDelay'). This usually results in a brief screen flicker and a message such as:

Display driver stopped responding and has successfully recovered.

Jacket has some really complicated operations in it. Operations such as reductions (MIN, MAX, SUM, etc), for example can take more than the pre-specified value in TdrDelay. What this means is that when you attempt to perform these operations on GPU variables in Jacket, TDR kicks in and prevents your computation from (what it perceives as) making the driver unresponsive, by resetting the GPU. Typically this causes one or all of the following to happen:

  • The MATLAB script you were running will fail
  • A CUDA failure error message ( what does this look like? ) is displayed
  • Your screen will briefly flicker and you will get the error message above
alt text
Windows Timeout Message

To prevent TDR from obstructing complex computations from executing, the Jacket installer automatically sets the Timeout to a higher value on Windows Vista and 7.

This is entirely a user choice. If you do not want Jacket to perform this operation, select "No" in the appropriate window during the Install process.

For more information on this, please read the Microsoft paper on TDR.

To manually set the TDR timeout value:

  1. Open the registry editor ( Start > Run > regedit )
  2. Navigate to: HKLM\System\CurrentControlSet\Control\GraphicsDrivers\
  3. Add DWORD TdrDelay if is not present (Right-click > New DWORD > TdrDelay)
  4. Set value of TdrDelay to the number of seconds you want to allow your GPU to run before timing out. Try "7" seconds.

For more details checkout the MSDN Knowledge Base.

Back to Top


How do I check the availability of GPU memory while running a program?

Often, you might face the need to perform computations based on the amount of memory free on the GPU. Something on the lines of:

if gpufree > val
 perform computation
end

GINFO won't fit the bill, as you can't frame if-conditions based on it (actually you can, but that involves the use of EVALC and text-parsing). An easier way is to use an undocumented function, gpu_entry (this is the same entry-point function that gets called when you type GINFO). Simply type gpu_entry(13).

Gfree = getfield(gpu_entry(13), 'gpu_free')
Gtotal = getfield(gpu_entry(13), 'gpu_total')

This stores the free and total memory in bytes, in the two variables. You can then either display the memory, or perform calculations only if the free memory is above a particular value:

if Gfree > 1e9
 <perform calculation>
else
 disp('not enough memory to perform calculation.')
end

Back to Top


How do I run Jacket commands without a GPU?

If you are developing Jacket code and need to share it with a colleague that does not have a GPU, we recommend you follow the CPU Stubs Example.

Back to Top

Views
Personal tools