SUBSASGN (Matrix)

From Jacket Wiki

Jump to: navigation, search

The SUBSASGN function is often used for various matrix manipulations. Based on the analysis below the Jacket Performance Index for SUBSASGN (matrix single precision) computations is set to 0. The speed-up of an NVIDIA C1060 Tesla GPU relative to an Intel Core i7 975 Extreme is below 0.2 for all tested square matrix sizes up to 3000 x 3000. The speed-up does increase with matrix size - just very slowly.


An overview of speed-up versus matrix size for a selection of GPUs is shown in Fig. 1 - the reference is a Core 2 Duo 2.8 GHz CPU. The CPUs are clearly superior to the GPUs for the current implementation of SUBSASGN. For matrices below size 800 x 800 the Core i7 975 Extreme is faster than the Core 2 Duo 2.8 GHz reference. But for square matrix sizes at or above 800 x 800 the two CPUs are equally fast.


Fig. 1: Measured results of speed-up versus (square) matrix size (#Rows = #Columns) for an NVIDIA C1060 Tesla, NVIDIA GeForce 9800GT and NVIDIA GeForce 9600M GT versus an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with 'Higher performance' mode selected in 'System Preferences' (mains connected). For comparison the results for a Core i7-975 Extreme relative to the Core 2 Duo 2.8 GHz is also shown. The code to combine the different results is shown here. All results are done in single precision.


Analysis Results: Single Precision Matrix

The analysis of the function SUBSASGN operating on matrices has been done by running the SUBSASGN function on random square input single precision matrices of different sizes - on a selection of CPUs and GPUs - and measure the execution time of SUBSASGN. The code has been tested across different computer platforms to get an idea of how much speed improvement, which can generally be obtained by using the GPU. It is also relevant to see if there is some certain matrix size, which should be chosen for others (when possible). All tests have been done in single precision.


Fig. 2: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with an NVIDIA GeForce 9400M GPU in 'Better battery life' (but mains connected) - see Reference System #3. The code to do the analysis can be downloaded here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded here. The analysis was conducted in single precision.


Fig. 3: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with an NVIDIA GeForce 9600M GT GPU in 'Higher performance' (mains connected) - see Reference System #3. The code to do the analysis can be downloaded here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded here. The analysis was conducted in single precision.


Fig. 4: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core 2 Duo 2.8 GHz (Ubuntu Linux) with an NVIDIA GeForce 9600M GT GPU (mains connected) - see Reference System #2. The code to do the analysis can be downloaded here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded here. The analysis was conducted in single precision.


Fig. 5: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core i7 975 (Microsoft Windows 7) with an NVIDIA GeForce 9800GT GPU - see Reference System #1. The code to do the analysis can be downloaded here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded here. The analysis was conducted in single precision.


Fig. 6: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core i7 975 (Microsoft Windows 7) with an NVIDIA Quadro FX-3800 GPU - see Reference System #1. The code to do the analysis can be downloaded here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded here. The analysis was conducted in single precision.


Fig. 7: Measured results of execution time and speed-up versus (square) matrix size (#Rows = #Columns) for an Intel Core i7 975 (Microsoft Windows 7) with an NVIDIA C1060 Tesla GPU - see Reference System #1. The code to do the analysis can be downloaded here - this code can also be used to plot already existing data. The measured pre-run data in the figure is contained in a .mat file, which can be downloaded here. The analysis was conducted in single precision.


Further Information

See also SUBSREF (Matrix).



Go Home: Torben's Corner.


Views
Personal tools