Adaptive Code
From Jacket Wiki
When writing for example toolboxes of code, which should be usable for many users it is a huge advantage to make the code adapt to the hardware available. Jacket is not directly designed for this but fortunately an undocumented function exist, which can help us here. The name of the function is gpu_entry(13), which produces an output like the following:
>> gpu_entry(13) ans = gpu_used: 0 gpu_unused: 0 gpu_free: 190177280 gpu_total: 266010624 cpu_used: 0 compute: 1.1000 clockspeed: 1100000 name: 'GeForce 9400M' system: 'Mac32' >>
The computer used here is an Apple MacBook Pro with an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with an NVIDIA GeForce 9400M GPU in 'Better battery life' (but mains connected) mode selected in 'System Preferences' - see Reference System #3. WARNING: Although it is tempting ... be careful not to run gpu_entry(x) with x being any integer you can come up with. Some values caused crashes on my computer.
The output from the function gpu_entry(13) is a structure and we can get hold of the values like the following:
>> gpu_info = gpu_entry(13) gpu_info = gpu_used: 0 gpu_unused: 0 gpu_free: 195395584 gpu_total: 266010624 cpu_used: 0 compute: 1.1000 clockspeed: 1100000 name: 'GeForce 9400M' system: 'Mac32' >> gpu_mem = gpu_info.gpu_free gpu_mem = 195395584 >> gpu_compute = gpu_info.compute gpu_compute = 1.1000 >>
So here we have extracted two things: 1) the free GPU memory, and 2) the compute level of the available GPU. The latter can of course be used to choose between single and double precision computations. The former can be used when for example some data is to be transferred from the CPU to the GPU, and we need to know how much data we can actually transfer within the limits of available GPU memory.
Say we have a square matrix A_ of size Sz x Sz. In single precision we need 4*Sz^2 bytes to store such a matrix. Suppose we estimate that we need for example approximately 30 MB for all the rest of the computations, we can then automatically make the largest possible matrix. We could then do the following:
>> gpu_info = gpu_entry(13) gpu_info = gpu_used: 0 gpu_unused: 0 gpu_free: 195067904 gpu_total: 266010624 cpu_used: 0 compute: 1.1000 clockspeed: 1100000 name: 'GeForce 9400M' system: 'Mac32' >> gpu_mem = gpu_info.gpu_free gpu_mem = 195067904 >> Sz = floor( sqrt((gpu_mem - 30E6)/4) ) Sz = 6423 >> A_ = grand(Sz,Sz); >> gpu_info = gpu_entry(13) gpu_info = gpu_used: 165052416 gpu_unused: 0 gpu_free: 29876224 gpu_total: 266010624 cpu_used: 0 compute: 1.1000 clockspeed: 1100000 name: 'GeForce 9400M' system: 'Mac32' >>
As seen above we have approximately 30 MB left as planned. Of course the compute level can be used in a similar way to for example use one algorithm for compute 1.1 and another if compute 1.3 is available.
Go Home: Torben's Corner