Adaptive Code

From Jacket Wiki

Jump to: navigation, search

When writing for example toolboxes of code, which should be usable for many users it is a huge advantage to make the code adapt to the hardware available. Jacket is not directly designed for this but fortunately an undocumented function exist, which can help us here. The name of the function is gpu_entry(13), which produces an output like the following:

>> gpu_entry(13)
 
ans = 
 
      gpu_used: 0
    gpu_unused: 0
      gpu_free: 190177280
     gpu_total: 266010624
      cpu_used: 0
       compute: 1.1000
    clockspeed: 1100000
          name: 'GeForce 9400M'
        system: 'Mac32'
 
>>

The computer used here is an Apple MacBook Pro with an Intel Core 2 Duo 2.8 GHz (Mac OSX Snow Leopard) with an NVIDIA GeForce 9400M GPU in 'Better battery life' (but mains connected) mode selected in 'System Preferences' - see Reference System #3. WARNING: Although it is tempting ... be careful not to run gpu_entry(x) with x being any integer you can come up with. Some values caused crashes on my computer.


The output from the function gpu_entry(13) is a structure and we can get hold of the values like the following:

>> gpu_info = gpu_entry(13)
 
gpu_info = 
 
      gpu_used: 0
    gpu_unused: 0
      gpu_free: 195395584
     gpu_total: 266010624
      cpu_used: 0
       compute: 1.1000
    clockspeed: 1100000
          name: 'GeForce 9400M'
        system: 'Mac32'
 
>> gpu_mem = gpu_info.gpu_free
 
gpu_mem =
 
   195395584
 
>> gpu_compute = gpu_info.compute
 
gpu_compute =
 
    1.1000
 
>>

So here we have extracted two things: 1) the free GPU memory, and 2) the compute level of the available GPU. The latter can of course be used to choose between single and double precision computations. The former can be used when for example some data is to be transferred from the CPU to the GPU, and we need to know how much data we can actually transfer within the limits of available GPU memory.


Say we have a square matrix A_ of size Sz x Sz. In single precision we need 4*Sz^2 bytes to store such a matrix. Suppose we estimate that we need for example approximately 30 MB for all the rest of the computations, we can then automatically make the largest possible matrix. We could then do the following:

>> gpu_info = gpu_entry(13)
 
gpu_info = 
 
      gpu_used: 0
    gpu_unused: 0
      gpu_free: 195067904
     gpu_total: 266010624
      cpu_used: 0
       compute: 1.1000
    clockspeed: 1100000
          name: 'GeForce 9400M'
        system: 'Mac32'
 
>> gpu_mem = gpu_info.gpu_free
 
gpu_mem =
 
   195067904
 
>> Sz = floor( sqrt((gpu_mem - 30E6)/4) )
 
Sz =
 
        6423
 
>> A_ = grand(Sz,Sz);
>> gpu_info = gpu_entry(13)
 
gpu_info = 
 
      gpu_used: 165052416
    gpu_unused: 0
      gpu_free: 29876224
     gpu_total: 266010624
      cpu_used: 0
       compute: 1.1000
    clockspeed: 1100000
          name: 'GeForce 9400M'
        system: 'Mac32'
 
>>


As seen above we have approximately 30 MB left as planned. Of course the compute level can be used in a similar way to for example use one algorithm for compute 1.1 and another if compute 1.3 is available.



Go Home: Torben's Corner


Views
Personal tools