#

GPGPU Computing on Odyssey

Odyssey has a number of nodes that have NVIDIA Tesla general purpose graphics processing units (GPGPU) attached to them. It is possible to use CUDA tools to run computational work on them and in some use cases see very significant speed ups.

One node with 8 Tesla K20Xm is available for general use from the gpu partition; the remaining are nodes are owned by various research groups available in their private partitions (and may be available when idle through serial_requeue and the options shown below.) Direct access to these nodes by members of other groups is by special request. Please visit the RC Portal and submit a help request for more information.

GPGPU's on SLURM

To request a GPU on slurm just add #SBATCH --gres=gpu to your submission script and it will give you access to a GPGPU. You can use this method to request both CPUs and GPGPUs independently. So if you want 1 CPU and 2 GPGPUs from our general use node, you would specify:

#SBATCH -p gpu
#SBATCH -n 1
#SBATCH --gres=gpu:2

Using CUDA-dependent modules

CUDA-dependent applications are accessed on Odyssey in a manner that is similar to compilers and MPI libraries. For these applications, a CUDA module must first be loaded before an application is available. For example, to use cuDNN, a CUDA-based neural network library from NVIDIA, the following command will work:

$ module load cuda/7.5-fasrc02 cudnn/7.0-fasrc02

If you don't load the CUDA module first, the cuDNN module is not available.

$ module purge
$ module load cudnn/7.0-fasrc02
Lmod has detected the following error:
The following module(s) are unknown: "cudnn/7.0-fasrc02"
Please check the spelling or version number. Also try "module spider ..."

One of the main benefits of this arrangement is in the segregation of CUDA-dependent and CUDA-independent builds for software that has an optional dependency. For example, the TensorFlow deep learning software can be built in CPU or GPU forms. Which form is available depends on whether the CUDA module is loaded first.

# CPU only
$ module load gcc/4.9.3-fasrc01 tensorflow/1.0.0-fasrc01
# GPU
$ module load gcc/4.9.3-fasrc01 cuda/7.5-fasrc02 tensorflow/1.0.0-fasrc02

If a CUDA-dependent application was not built with a particular version of the CUDA module, then you won't be able to load it with that module. This will not work:

$ module load gcc/4.9.3-fasrc01 cuda/6.0-fasrc02 tensorflow/1.0.0-fasrc02
Lmod has detected the following error:
The following module(s) are unknown: "tensorflow/1.0.0-fasrc02"
Please check the spelling or version number. Also try "module spider ..."

because TensorFlow was not built with CUDA 6.0.

Using CUDA with MPI and Comp modules

There are a couple of packages, notably Amber and LAMMPS, that have both CUDA and MPI as optional build libraries. The CUDA module setup is able to accommodate this, but the application build that is loaded is dependent on the order in which modules are specified.

For example, to load a version of Amber that was built with the Intel compiler and against CUDA 7.5 with no MPI support:

$ module load intel/15.0.0-fasrc01 cuda/7.5-fasrc02 Amber/14-fasrc06

To load a version of Amber that was built with the Intel compiler, Intel's MPI libraries and CUDA 6.0

$ module load intel/15.0.0-fasrc01 impi/5.0.1.035-fasrc01 cuda/6.0-fasrc02 Amber/14-fasrc07

In the previous example, it's important that the CUDA module load follow the Intel MPI module load. If the modules were loaded in this order:

$ module load intel/15.0.0-fasrc01 cuda/6.0-fasrc02 impi/5.0.1.035-fasrc01 Amber/14-fasrc07

the build that supports both CUDA and MPI would not be loaded. In this case, if it were available, an MPI-only version would be activated.

A similar situation exists with compiler dependent modules. As noted above, this module load statement will activate a compiler-dependent, CUDA-enabled version of TensorFlow.

$ module load gcc/4.9.3-fasrc01 cuda/7.5-fasrc02 tensorflow/1.0.0-fasrc02

However, if the order of the compiler and CUDA module loads are switched, the GPU-enabled version will not be used. In fact, an Lmod error will be thrown.

$ module load cuda/7.5-fasrc02 gcc/4.9.3-fasrc01 tensorflow/1.0.0-fasrc02
Lmod has detected the following error:
The following module(s) are unknown: "tensorflow/1.0.0-fasrc02"
Please check the spelling or version number. Also try "module spider ..."

Heterogeneity of CUDA installation within Odyssey partitions

Unfortunately not all nodes have every version of CUDA installed yet because of the amount of disk space each version takes. In order to make sure you land on a node that supports the version you're loading you need to add --constraint=cuda-$version when submitting jobs. For example:

#SBATCH -p gpu
#SBATCH -n 1
#SBATCH --gres=gpu
#SBATCH --constraint=cuda-7.5

The above sbatch file will request a single gpu on a node in the gpu partition that has cuda-7.5 installed.

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.