Current Stable Release: v1.4.1-6737. No release candidates are currently available.

Jacket SDK

From JacketWiki

Jump to: navigation, search
Back to the Main Page

The Jacket Software Developer Kit (SDK) makes integration of custom CUDA code into the Jacket runtime very easy. With a few simple jkt functions which mimic the standard MEX API, you can integrate custom CUDA kernels into Jacket. The Jacket SDK comes with a number of examples that illustrate different usage scenarios, as contained in the <jacket_root>/sdk directory.

Jacket SDK is available by default with every Jacket trial license. For purchased licenses, the Jacket SDK license add-on is required to run SDK-generated MEX files.

Contents

Requirements

All files necessary for the Jacket SDK are available with every Jacket installation. The header files jacket.h and err.h are available in the <jacket_root>/engine/include directory. The SDK library files libjacket_sdk.lib and libjacket64_sdk.lib (for Windows) or libjacket_sdk.a and libjacket64_sdk.a (for Linux) are available in the <jacket_root>/engine directory.

You must setup your MATLAB environment to be able to compile CUDA functions, as explained here: http://developer.nvidia.com/object/matlab_cuda.html

You must have the CUDA Toolkit installed on your machine in order to call nvcc. Download the CUDA Toolkit here: http://www.nvidia.com/object/cuda_get.html. Be sure to download the CUDA Toolkit which matches the Toolkit used to generate the version of Jacket you are using. Check the Release Notes or run GINFO to know which CUDA Toolkit corresponds to your version of Jacket.

SDK Overview

The table below provides an overview of the various functions available with Jacket SDK with more detailed information following in the SDK Function Reference section:

Table 2: Jacket SDK Functions
Function Category SDK Function Description
Basic Information:
Determine array & data type
jkt_matlab
True if the input is a MATLAB CPU array
jkt_gsingle
True if the input is a Jacket GPU array
jkt_complex
True if the input GPU or CPU array is complex
jkt_class
Determine the class of the GPU or CPU array (eg: single, double)
jkt_dims
Determine the dimensions of the GPU or CPU input
jkt_numel
Determine the number of elements in the GPU or CPU input
jkt_scalar
True if the GPU or CPU array is a scalar (one element)
jkt_gfor
Determine the number of GFOR tiles in this array (zero indicates non-GFOR input)
Memory operations:
Allocate memory, transfer data
jkt_wrap
Wrap MATLAB CPU array into Jacket GPU array (downcast if necessary)
jkt_new
Create new Jacket array with specified number of rows and columns
jkt_new_array
Create new Jacket array with specified dimensions
jkt_mem
Get a device (GPU) pointer to Jacket array
jkt_mem_host
Get a host (CPU) pointer to Jacket array
SDK operations:
Jacket entry function
jktFunction
Jacket entry function similar to MATLAB's mexFunction


SDK Function Reference

jkt_matlab

  • Syntax: bool jkt_matlab(mxArray *m);
  • Returns: bool
  • Arguments: m - pointer to a CPU MATLAB array
  • Description: This function returns true if the input argument m is a CPU MATLAB array.

jkt_garray

  • Syntax: bool jkt_garray(mxArray *m);
  • Returns: bool
  • Arguments: m - pointer to a GPU MATLAB array
  • Description: This function returns true if the input argument m is a Jacket-specific GPU array.

jkt_wrap

  • Syntax: mxArray *jkt_wrap(mxArray *m);
  • Returns: mxArray *
  • Arguments: m - pointer to a CPU MATLAB array
  • Description: This function wraps the MATLAB array m into a Jacket-specific GPU array. The input is downcast in precision if necessary, e.g. if double precision CPU data but GPU is only single-precision, then the data is downcast to single-precision before pushing it out. Has no effect on Jacket arrays.

jkt_new

  • Syntax: mxArray *jkt_new(int rows, int cols, mxClassID cls, bool is_complex);
  • Returns: mxArray *
  • Arguments:
    • rows - number of rows in the array
    • cols - number of columns in the array
    • cls - class of the array (eg: single, double)
    • is_complex - complex type indicator
  • Description: This function creates a GPU array of size rows*cols and of the type defined by cls.

jkt_new_array

  • Syntax: mxArray *jkt_new_array(int ndims, const mwSize *dims, mxClassID cls, bool is_complex);
  • Returns: mxArray *
  • Arguments:
    • ndims - number of dimensions in the array
    • dims - variable of type mwSize with size of each dimension
    • cls - class of the array (eg: single, double)
    • is_complex - complex type indicator
  • Description: This function creates a GPU array of ndims dimensions with the size of each dimension specified by dims and of the type defined by cls.

jkt_mem

  • Syntax: err_t jkt_mem(void **, mxArray *m);
  • Returns: err_t
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns a device side pointer to the input argument m which can be used to safely read/write to the array.

jkt_mem_host

  • Syntax: err_t jkt_mem_host(void **, mxArray *m);
  • Returns: err_t
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns a host side pointer to the input argument m which can be used to safely read/write to the array.

jkt_complex

  • Syntax: bool jkt_complex(mxArray *m);
  • Returns: mxArray *
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns true if the input argument m is complex.

jkt_class

  • Syntax: mxClassID jkt_class(mxArray *m);
  • Returns: mxclassID
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns the class type (e.g. mxSINGLE_CLASS, mxLOGICAL_CLASS) of the input argument m.

jkt_dims

  • Syntax: int jkt_dims(mxArray *m, const mwSize **dims);
  • Returns: int
  • Arguments:
    • m - pointer to GPU MATLAB array
    • dims - variable of type mwSize which can hold the size of each dimension
  • Description: This function returns the dimensions of the input argument m and stores the result in the output argument dims.


jkt_numel

  • Syntax: mwSize jkt_numel(mxArray *m);
  • Returns: mwSize
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns true if the input argument m has one element.

jkt_scalar

  • Syntax: bool jkt_scalar(mxArray *m);
  • Returns: bool
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns true if the input argument m has a length of 1.

jkt_gfor

  • Syntax: int jkt_gfor(mxArray *m);
  • Returns: int
  • Arguments: m - pointer to GPU MATLAB array
  • Description: This function returns non-zero if the given GPU array is associated with a GFOR loop. Its value indicates the number of GFOR iterations (e.g. tiles) represented in the underlying data. For example, if there are 100 iterations (gfor i=1:100) and the matrix is 30x30, then the underlying data is 30x30x100 elements in size. GFOR functionality will be expanded in coming releases.

jktFunction

  • Syntax: err_t jktFunction(int nlhs, mxArray *plhs[ ], int nrhs, mxArray *prhs[ ]);
  • Returns: err_t
  • Arguments:
    • nlhs - number of arguments on the left hand side
    • plhs[ ] - pointer to MATLAB array(s) on left hand side (output)
    • nrhs - number of arguments on the right hand side
    • nrhs[ ] - pointer to MATLAB array(s) on right hand side (input)
  • Description: This is the Jacket entry function. It mirrors the functionality of the MATLAB mexFunction as defined in the MATLAB documentation. It enables the user to get input from and send output to MATLAB.

SDK Examples

Simple Jacket SDK Usage Example

Filename: mymex.cu

The purpose of this example is to demonstrate the Jacket SDK and the simplicity of its use. This example takes a MATLAB vector as input and returns a vector with [0 1 2 3 .... n] added to the elements of the input array in that order.

Explanation

  1. The entire code is wrapped in the Jacket SDK jktFunction, which is the entry point to Jacket.
  2. Initially the input from MATLAB, made accessible by jktFunction, is obtained into a mxArray variable.
  3. Then the class and dimensions of the input are determined using Jacket SDK functions jkt_class and jkt_dims.
  4. Then a new mxArray is allocated with the previously obtained dimensions to store the output, using the jkt_new function.
  5. Device side pointers (GPU arrays) are initialized using the jkt_mem function. This provides a safe way for Jacket to read/write to the device (GPU).
  6. After this the CUDA kernel is launched and we are done!

This process is no more than declaring inputs and outputs, allocating memory to them and calling the CUDA kernel.

Median Filter Example

Filename: medfilt2

The purpose of the medfilt2 function in MATLAB is to calculate the median of a 2 dimensional array using the specified window size (m x n). The median of a specified window size is just the mean of the two central elements (in case, m + n is even) when the elements corresponding to the window are sorted. Due to the highly parallel nature of this operation, since each element can be calculated separately, it is easy to write a CUDA kernel.

Explanation

  1. The entire code is wrapped in the Jacket SDK jktFunction, which is the entry point to Jacket.
  2. The input array from MATLAB is obtained into a mxArray variable.
  3. Then the class and dimensions of the input are determined using Jacket SDK functions jkt_class and jkt_dims.
  4. Based on these dimensions a new mxArray is allocated to store the median output, using the jkt_new function.
  5. Device side pointers (GPU arrays) are initialized using the jkt_mem function. This provides a safe way for Jacket to read/write to the device (GPU).
  6. After this, texture memory or shared memory is allocated based on the type of medfilt2 example. There are various examples demonstrating use of texture memory, shared memory etc.
  7. After this the CUDA kernel launch configuration is determined and the kernel is launched.

Bitonic Sort Example

Filename: bitonic.cu

The purpose of this example is to demonstrate the conversion of existing CUDA code to MATLAB code using the Jacket SDK. This enables code that uses CUDA kernels to be run from MATLAB without change to the existing CUDA kernel. The bitonic sort is a highly parallel algorithm designed for execution on multiple cores. It achieves sorting on a one-dimensional array.

Explanation

  1. By looking at the NVIDIA CUDA SDK example, we can determine the input expected by the CUDA kernel.
  2. After determining this, it is a simple matter of declaring such structures using the Jacket SDK (as shown in the Simple Jacket SDK example) and calling the CUDA kernel to use these structures.
  3. Here we see that the kernel requires an input array, an output array and the number of elements in the input array.
  4. Hence, we get the input from MATLAB into a mxArray.
  5. Then, we allocate a new mxArray using jkt_new and using the dimensions of the input array. (These dimensions are obtained using jkt_dims).
  6. We then use jkt_mem to get device pointers to these arrays.
  7. We then launch the bitonic sort kernel. The kernel launch configuration is kept identical to the original example. Hence this example works only for a maximum of 512 input elements since that is the maximum number of threads supported by a single block on the GPU.

Black-Scholes Example

Filename: black_scholes.cu

This provides another example of converting integrating CUDA code into the MATLAB-based Jacket runtime. The Black-Scholes model provides a partial differential equation (PDE) for the evaluation of an option price under certain assumptions.

Explanation

  1. By looking at the NVIDIA CUDA SDK example, we can determine the input expected by the CUDA kernel.
  2. After determining this, we can declare such structures using the Jacket SDK (as shown in the Simple Jacket SDK example) and call the CUDA kernel to use these structures.
  3. Here we see that the kernel expects two array to store the outputs for Call and Put results. Also it expects three input arrays specifying option price, option strike, and option years.
  4. Hence we obtain three arrays from MATLAB into mxArray variables. Also we declare the constants which are required by the kernel.
  5. We then allocate two new mxArray variables to store the output using jkt_new and the dimensions of the input obtained using jkt_dims.
  6. We use jkt_mem to get device pointers to these arrays.
  7. We then launch the Black-Scholes kernel with these parameters and other constants as required. The kernel launch configuration is kept identical to the original example.

SDK Instructions

Pre-compilation instructions

Linux

  1. Open the Makefile supplied with the example you are trying to compile.
  2. Ensure that the paths for MATLAB and CUDA installations and CUDA libraries are set correctly.

Windows

  1. Open the nvmexopts.bat file in the <jacket_root>/sdk/common folder.
  2. Ensure that the paths for MATLAB, CUDA, and MSVCE installations are set correctly.
  3. Additionally, ensure that the target architecture is specified correctly (win32 or win64). The default is set to win32.
  4. If using a win64 machine, you will also need to modify the MATLAB extern library path from %MATLAB%/extern/lib/win32 to %MATLAB%/extern/lib/win64 wherever applicable.

Compile

Linux

  1. Browse to the directory where the example is located.
  2. Run make.
  3. The example will be compiled and available for use from within MATLAB. In case of medfilt2, multiple examples will be compiled that demonstrate usage of shared memory, texture memory, GFOR loop.

Windows

  1. Browse to the common folder
  2. Start MATLAB.
  3. run gmex FILENAME       (See GMEX)
  4. The example will be compiled and available for use from within MATLAB. In case of medfilt2, multiple examples will be compiled that demonstrate usage of shared memory, texture memory, GFOR loop.

How to run

Examples of how to run the SDK sample applications after compilation:

Simple SDK example

A = gsingle(rand(1, 10));
mymex(A);

Bitonic sort

A = gsingle(rand(1, 64));
bitonic(A);

Black-Scholes options calculation

A = gsingle(rand(1, 1000));
B = gsingle(rand(1, 1000));
C = gsingle(rand(1, 1000));
[D E] = BlackScholes(A, B, C);
Personal tools