Jacket MGL
From Jacket Wiki
Back to Documentation, Jacket HPC
Contents |
Introduction
Jacket's Multi-GPU License (MGL) allows you to seamlessly utilize multiple GPUs on the same machine or across a cluster. In many cases, little to no code modification is required to take advantage of additional GPUs.
Here's a quick example showing how to generate random values and perform FFTs in parallel across all available devices all from the same MATLAB® session:
ngpu = getfield(ginfo, 'gpu_count'); for i = 1:ngpu gselect(i) % switch device out{i} = fft(rand(4096, gsingle)); end gsync('all') % wait for all devices to finish
This ability to span computation across multiple GPUs allows for an unprecedented ability to transparently scale GPU and CPU computing resources, making Jacket the easiest platform for GPU computing.
Jacket also includes support for PARFOR and SPMD from the MATLAB® Parallel Computing Toolbox to distribute computations among workers. If you want to run your computations across a cluster (multi-node), then you need the Jacket High-Performance Computing (HPC) License.
Every trial license contains support for up to four GPUs. You can purchase Jacket MGL online or email us. In your order, you will need to specify the number of GPUs you wish to enable for MGL.
Jacket MGL with MATLAB PCT: Getting Started
Parallel computing in MATLAB is built around the concept of workers. Just as you can start up Jacket in a MATLAB session, you can have each PCT worker spawn its own Jacket instance.
The number of workers is declared by using the command MATLABPOOL. Although a user can create any number of workers, the optimal performance is only achieved when there is one CPU core dedicated for each worker. For example, on a quad-core machine, it is recommended to create four workers only. The similar philosophy applies to workers assignment to GPUs. The total number of workers in the compute pool should be equal to the number of GPUs present in the system.
Testing the setup
>> matlabpool(3) % three workers >> spmd, ginfo, end Lab 1: Jacket v2.0 (build 06864c2) by AccelerEyes (64-bit Linux) CUDA toolkit 4.0, driver 285.05.05 GPU1 Tesla C2070, 5376 MB, Compute 2.0 (single,double) (in use) GPU2 Tesla C2070, 5376 MB, Compute 2.0 (single,double) GPU3 Tesla C2070, 5376 MB, Compute 2.0 (single,double) Display Device: GPU2 Tesla C2070 Memory Usage: 4742 MB free (5376 MB total) Lab 2: Jacket v2.0 (build 06864c2) by AccelerEyes (64-bit Linux) CUDA toolkit 4.0, driver 285.05.05 GPU1 Tesla C2070, 5376 MB, Compute 2.0 (single,double) GPU2 Tesla C2070, 5376 MB, Compute 2.0 (single,double) (in use) GPU3 Tesla C2070, 5376 MB, Compute 2.0 (single,double) Display Device: GPU2 Tesla C2070 Memory Usage: 4734 MB free (5376 MB total) Lab 3: Jacket v2.0 (build 06864c2) by AccelerEyes (64-bit Linux) CUDA toolkit 4.0, driver 285.05.05 GPU1 Tesla C2070, 5376 MB, Compute 2.0 (single,double) GPU2 Tesla C2070, 5376 MB, Compute 2.0 (single,double) GPU3 Tesla C2070, 5376 MB, Compute 2.0 (single,double) (in use) Display Device: GPU2 Tesla C2070 Memory Usage: 4965 MB free (5376 MB total)
Additional Resources
Two examples included with the Jacket installation demonstrate some of the multi-GPU capabilities:
- mgl_example.m in jacket/examples/mgl_example/ demonstrates various ways of running FFT across multiple workers and devices.
- black_scholes_mgl.m from jacket/examples/black_scholes_example implements the standard Black-Scholes options pricing model using multiple GPUs to achieve higher options-per-second rates
Helper functions
- GINFO - Get detailed information of each GPU
- GSELECT - Switch between multiple devices
- GSYNC - Synchronize one or more devices for timing experiments
- TIMEIT - An accurate timer for either CPU or GPU code
Programming Notes
- Starting with v1.8.2, GSELECT can be used anytime to switch betwen devices
- Variables created on a GPU, can not be used on a different GPU.
- GFOR does not yet automatically parallelize across multiple GPUs.
Using Jacket with PCT
- Parallel Computing Toolbox (PCT). There are several programming constructs provided by PCT. Jacket fully supports SPMD and PMODE parallel programming constructs. The PARFOR construct is also supported but with some minor limitations.
- The loop-iterator variable, loop-start and loop-end variables must be MATLAB basic data types and not GPU data types such as GSINGLE or GDOUBLE.
- Any variable created inside a PARFOR loop will be created as a MATLAB type by default, and therefore cannot have a GPU value assigned to it. To solve this problem, it must be pre-allocated prior to the loop. In some cases, it must also be referenced as a value in the loop before the assignment. This is also the standard practice in MATLAB. For example:
parfor i=1:10 n(i) = gsingle(i); end
This will not work because n will be created as a CPU data type. Instead, the following must be done:
n = gones(1,10); parfor i=1:10 n(i); n(i) = gsingle(i); end
The n = gones(1,10) pre-allocates the output variable. The n(i); causes the PARFOR loop to understand that it is a value from outside the loop and must be brought into the loop (since it's being used), otherwise it tries to assume it is only a temporary value.