CUDA: Occupancy Calculator

The CUDA Occupancy Calculator is an Excel spreadsheet that ships with the CUDA Toolkit. It can be used to determine if the number of threads per block being used to launch a kernel is optimal. This spreadsheet can be found at %ProgramData%\NVIDIA Corporation\GPU SDK\C\tools

The spreadsheet requires 4 inputs from you specific to the kernel you are analyzing:

  1. The compute capability of the CUDA device
  2. Threads per block you are using for the kernel
  3. Registers per thread for the kernel
  4. Shared memory per block

You already know (1) and (2) since you are the author of the kernel. (3) and (4) can be found by compiling the code with the option --ptxas-options=-v. This information can be found in the Output window of Visual Studio during compilation. Another alternative is to run the CUDA program with the Compute Visual Profiler and this information can be found in the Profiler Output sheet.

Once the above 4 numbers are entered, the 3 charts on the spreadsheet update to show the position of your kernel on them. The 3 charts deal with the parameters threads per block, registers per block and shared memory respectively. Look for the red triangle on the chart whose parameter you have the flexibility to change.

For example, say for a given kernel I have no say in the number of registers and shared memory it uses. However, I have the ability to change the number of threads per block it launches with. Assume that I am currently using 200 threads per block for this kernel. For this case, I look at Chart 1 (Varying Block Size), and check if the red triangle is on any of the global maxima of the curve. If it is not, I look at the threads per block that will put this kernel at the global maxima and try my kernel with that (say 256). In most cases, my CUDA program should execute a bit faster due to this change since the occupancy of the GPU by the threads of this kernel has been improved.

Tried with: CUDA 4.0

About these ads

3 thoughts on “CUDA: Occupancy Calculator

  1. Are you aware of a CUDA occupancy calculator for version higher than 2.1 ? Although I hope, nothing much would have changed but still, I was curious.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s