From Jacket Wiki
The SUBSREF function is often used for inspection of various matrices. Based on the analysis below the Jacket Performance Index for SUBSREF (matrix single precision) computations is set to 0. The speed-up of an NVIDIA C1060 Tesla GPU relative to an Intel Core i7 975 Extreme is below approximately 0.16 for all tested square matrix sizes (max. 3000 x 3000).
An overview of speed-up versus matrix size for a selection of GPUs is shown in Fig. 1 - the reference is a Core 2 Duo 2.8 GHz CPU. As for the SUBSASGN this type of matrix manipulation is way better handled by the CPU with the current Jacket implementation. For some matrix sizes the Core 2 Duo 2.8 GHz CPU is faster than the Core i7 975 Extreme and for other matrix sizes it is the other way around.
Fig. 1: Measured results of speed-up versus (square) matrix size (#Rows = #Columns) for an NVIDIA C1060 Tesla, NVIDIA GeForce 9800GT and NVIDIA GeForce 9600M GT versus an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with 'Higher performance' mode selected in 'System Preferences' (mains connected). For comparison the results for a Core i7-975 Extreme relative to the Core 2 Duo 2.8 GHz is also shown. The code to combine the different results is shown
here. All results are done in
single precision.
Analysis Results: Single Precision Matrix
The analysis of the function SUBSREF operating on matrices has been done by running the SUBSREF function on random square input single precision matrices of different sizes - on a selection of CPUs and GPUs - and measure the execution time of SUBSREF. The code has been tested across different computer platforms to get an idea of how much speed improvement, which can generally be obtained by using the GPU. It is also relevant to see if there is some certain matrix size, which should be chosen for others (when possible). All tests have been done in single precision.
Fig. 2: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with an NVIDIA GeForce 9400M GPU in 'Better battery life' (but mains connected) - see
Reference System #3. The code to do the analysis can be downloaded
here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded
here. The analysis was conducted in
single precision.
Fig. 3: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with an NVIDIA GeForce 9600M GT GPU in 'Higher performance' (mains connected) - see
Reference System #3. The code to do the analysis can be downloaded
here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded
here. The analysis was conducted in
single precision.
Fig. 4: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core 2 Duo 2.8 GHz (Ubuntu Linux) with an NVIDIA GeForce 9600M GT GPU (mains connected) - see
Reference System #2. The code to do the analysis can be downloaded
here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded
here. The analysis was conducted in
single precision.
Fig. 5: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core i7 975 (Microsoft Windows 7) with an NVIDIA GeForce 9800GT GPU - see
Reference System #1. The code to do the analysis can be downloaded
here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded
here. The analysis was conducted in
single precision.
Fig. 6: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core i7 975 (Microsoft Windows 7) with an NVIDIA Quadro FX-3800 GPU - see
Reference System #1. The code to do the analysis can be downloaded
here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded
here. The analysis was conducted in
single precision.
Fig. 7: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core i7 975 (Microsoft Windows 7) with an NVIDIA C1060 Tesla GPU - see
Reference System #1. The code to do the analysis can be downloaded
here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded
here. The analysis was conducted in
single precision.
Further Information
See also SUBSASGN (Matrix).
Go Home: Torben's Corner