MPI Codes on Odyssey

 

Table of Contents:

These pages are intended to help you compiling and running you MPI applications on the Odyssey cluster. Before you can access Odyssey, you must first have an active user account and a valid password. If you don’t, please complete an account request form. Instructions for accessing the cluster will be sent to you via email.

A UNIX/Linux SSH or an SSH-enabled client is required to log in to Odyssey. There are a number of SSH-capable clients available for Windows, Mac, and UNIX/Linux machines. An example session is given below:

login as: username
Using keyboard-interactive authentication.
Password:
Verification code:
Last login: Tue Jul 19 13:26:16 2011 from wrls-249-195-215.wrls-client.fas.harvard.edu
[username@iliadaccess03 ~]$

At the Verification code prompt put in your Openauth token. You may only use each token once. If you type something incorrectly or you want to open a second session, you must wait until a new token is generated before trying again.

A list of all software packages, including MPI libraries, currently installed
on Odyssey is available here.

The Message Passing Interface (MPI) library allows processes in your parallel application to communicate with one another by sending and receiving messages. There is no default MPI library in your environment when you log in to Odyssey. You need to choose the desired MPI implementation for your applications. This is done by loading an appropriate MPI module. Currently the available MPI implementations on our cluster are OpenMPI and Mvapich2. For both implementations the MPI libraries are compiled and built with either the Intel compiler suite or the GNU compiler suite. These are organized in modules. The most recent available versions/modules are:

  • hpc/openmpi-1.5.3_intel-12.3.174
  • hpc/openmpi-1.5.3_gcc-4.6.1
  • hpc/openmpi-1.6.2_intel-13.0.079
  • hpc/openmpi-1.6.2_gcc-4.7.2
  • hpc/mvapich2-1.5_intel-11.1.072
  • hpc/mvapich2-1.5_gnu
  • hpc/mvapich2-1.9a_intel-13.0.079

For instance, if you want to use version 1.5.3 of OpenMPI compiled with version 4.6.1 of the GNU compiler you need to load the module hpc/openmpi-1.5.3_gcc-4.6.1. This is done by the module load command, i.e.

[username@iliadaccess03 ~]$ module load hpc/openmpi-1.5.3_gcc-4.6.1
Loading module hpc/gcc-4.6.1.
Loading module hpc/openmpi-1.5.3_gcc-4.6.1.
[username@iliadaccess03 ~]$

Modules on Odyssey get updated often so check if there are more recent ones. The modules are set up so that you can only have one MPI module loaded at a time. If you try loading a second one it will automatically unload the first. This is done to avoid dependencies collisions.

There are four ways you can set up your MPI on Odyssey:

  • Put the module load command in your startup filesMost users will find this option most convenient. You will likely only want to use a single version of MPI for all your work. This method also works with all MPI modules currently available on Odyssey.
  • Load the module in your current shellFor the current MPI versions you do not need to have the module load command in your startup files. If you submit a job the remote processes will inherit the submission shell environment and use the proper MPI library. Note this method does not work with older versions of MPI.
  • Load the module in your job scriptIf you will be using different versions of MPI for different jobs, then you can put the module load command in your script. You need to ensure your script can execute the module command properly.
  • Do not use modules and set environment variables yourselfYou obviously do not need to use modules but can hard code paths. However, these locations may change without warning so you should set them in one location only and not scatter them throughout your scripts.

First parallel program code

When you successfully log in you will land in your $HOME directory. Open a new file mpitest.<language_extension> ( language_extension: f, f90, c, or cpp ) with a text editor such as emacs or vi. Paste the contents of the below code into the file.

Fortran 77:

c=====================================================
c Fortran 77 example: MPI test
c=====================================================
program mpitest
implicit none
include 'mpif.h'
integer(4) :: ierr
integer(4) :: iproc
integer(4) :: nproc
integer(4) :: icomm
integer(4) :: i
call MPI_INIT(ierr)
icomm = MPI_COMM_WORLD
call MPI_COMM_SIZE(icomm,nproc,ierr)
call MPI_COMM_RANK(icomm,iproc,ierr)
do i = 0, nproc-1
call MPI_BARRIER(icomm,ierr)
if ( iproc == i ) then
write (6,*) "Rank",iproc,"out of",nproc
end if
end do
call MPI_FINALIZE(ierr)
if ( iproc == 0 ) write(6,*)'End of program.'
stop
end

Fortran 90:

!=====================================================
! Fortran 90 example: MPI test
!=====================================================
program mpitest
implicit none
include 'mpif.h'
integer(4) :: ierr
integer(4) :: iproc
integer(4) :: nproc
integer(4) :: icomm
integer(4) :: i
call MPI_INIT(ierr)
icomm = MPI_COMM_WORLD
call MPI_COMM_SIZE(icomm,nproc,ierr)
call MPI_COMM_RANK(icomm,iproc,ierr)
do i = 0, nproc-1
call MPI_BARRIER(icomm,ierr)
if ( iproc == i ) then
write (6,*) "Rank",iproc,"out of",nproc
end if
end do
call MPI_FINALIZE(ierr)
if ( iproc == 0 ) write(6,*)'End of program.'
stop
end program mpitest

C:

//=================================================================
// C example: MPI test
//=================================================================
#include <stdio.h>
#include "mpi.h"

int main(int argc, char** argv){
int iproc;
int icomm;
int nproc;
int i;
MPI_Init(&argc,&argv);
icomm = MPI_COMM_WORLD;
MPI_Comm_rank(icomm,&iproc);
MPI_Comm_size(icomm,&nproc);
for ( i = 0; i <= nproc - 1; i++ ){
MPI_Barrier(icomm);
if ( i == iproc ){
printf("%s %d %s %d \n","Rank",iproc,"out of",nproc);
}
}
MPI_Finalize();
return 0;
}

C++:

//=================================================================
// C++ example: MPI test
//=================================================================
#include <iostream>
#include <mpi.h>
using namespace std;

int main(int argc, char** argv){
int iproc;
int icomm;
int nproc;
int i;
MPI_Init(&argc,&argv);
icomm = MPI_COMM_WORLD;
MPI_Comm_rank(icomm,&iproc);
MPI_Comm_size(icomm,&nproc);
for ( i = 0; i <= nproc - 1; i++ ){
MPI_Barrier(icomm);
if ( i == iproc ){
cout << "Rank " << iproc << " out of " << nproc << endl;
}
}
MPI_Finalize();
return 0;
}

Compile the program

Fortran 77: [username@iliadaccess03 ~]$ mpif77 -o mpitest.x mpitest.f
Fortran 90: [username@iliadaccess03 ~]$ mpif90 -o mpitest.x mpitest.f90
C: [username@iliadaccess03 ~]$ mpicc -o mpitest.x mpitest.c
C++: [username@iliadaccess03 ~]$ mpicxx -o mpitest.x mpitest.cpp

Create a batch script

With a text editor like emacs or vi open a new file named mpitest.batch and paste in the contents below:

#!/bin/sh
#BSUB -n 16
#BSUB -J test
#BSUB -o test.out
#BSUB -e test.err
#BSUB -a openmpi
#BSUB -R "span[ptile=8]"
#BSUB -q short_parallel

mpirun.lsf ./mpitest.x

The batch script is used to instruct Odyssey to reserve computational resources for your job and how your application should be launched on the compute nodes reserved for the job.

Submit your job to the queue

The bsub command followed by < and the batch script name is used to submit your batch script to the Odyssey compute nodes. Upon submission a job ID and the queue name are returned, such as

[username@iliadaccess03 ~]$ bsub < mpitest.batch
Job <4251061> is submitted to queue <short_parallel>.

Monitor your job

After you submit your job, the system scheduler will check to see if there are compute nodes available to run the job. If there are compute nodes available, your job will start running. If there are not, your job will wait in the queue until there are enough resources to run your application. You can monitor your position in the queue with the bjobs command.

[username@iliadaccess03 ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
4251061 username PEND short_para iliadaccess test Jul 20 13:33
[username@iliadaccess03 ~]$

Examine your job’s output

When your job has completed you should see a file called test.out

Sender: LSF System <lsfadmin@hero1002>
Subject: Job 4251061: <test< Done

Job was submitted from host <iliadaccess03> by user <pkrastev> in cluster <lsf-odyssey>.
Job was executed on host(s) <8*hero1002>, in queue <short_parallel>, as user <pkrastev> in cluster <lsf-odyssey>.
<8*hero1102>
</n/home06/pkrastev> was used as the home directory.
</n/home06/pkrastev/Computer/MPI_tests> was used as the working directory.
Started at Wed Jul 20 15:43:59 2011
Results reported at Wed Jul 20 15:44:08 2011

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/sh
#BSUB -n 16
#BSUB -J test
#BSUB -o test.out
#BSUB -e test.err
#BSUB -a openmpi
#BSUB -R "span[ptile=8]"
#BSUB -q short_parallel

mpirun.lsf ./mpitest.x

------------------------------------------------------------

Successfully completed.

Resource usage summary:

CPU time : 1.80 sec.
Max Memory : 2 MB
Max Swap : 24 MB

Max Processes : 1
Max Threads : 1

The output (if any) follows:

Rank 0 out of 16
Rank 1 out of 16
Rank 2 out of 16
Rank 3 out of 16
Rank 4 out of 16
Rank 5 out of 16
Rank 6 out of 16
Rank 7 out of 16
Rank 8 out of 16
Rank 9 out of 16
Rank 10 out of 16
Rank 11 out of 16
Rank 12 out of 16
Rank 13 out of 16
Rank 14 out of 16
Rank 15 out of 16
End of program.
Job /lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper ./mpitest.x

TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
===== ========== ================ ======================= ===================
00000 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00001 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00002 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00003 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00004 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00005 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00006 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00007 hero1102 ./mpitest.x Done 07/20/2011 15:44:05
00008 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00009 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00010 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00011 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00012 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00013 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00014 hero1002 ./mpitest.x Done 07/20/2011 15:44:05
00015 hero1002 ./mpitest.x Done 07/20/2011 15:44:05

PS:

Read file <test.err> for stderr output of this job.

Batch jobs are jobs that run non-interactively under the control of a “batch script,” which is a text file containing a number of job directives and LINUX commands or utilities. Batch scripts are submitted to the “batch system,” where they are queued awaiting free resources on Odyssey. The batch system on Odyssey is known as “LSF” (Load Sharing Facility).

MPI and LSF are loosely coupled on Odyssey. In order to have your parallel application launched on the nodes reserved via LSF, you must use the mpirun.lsf command. Here is an example batch script for submitting a parallel job on Odyssey:

#!/bin/sh
#BSUB -n 16
#BSUB -e error_file.err
#BSUB -o output_file.out
#BSUB -J job_name
#BSUB -a openmpi
#BSUB -R "span[ptile=8]"
#BSUB -q short_parallel

mpirun.lsf ./my_executable.x

This example illustrates the basic parts of a script:

  • Job directive lines begin with #BSUB. These “LSF Directives” tell the batch system how many nodes to reserve for your job and how long to reserve those nodes. Directives can also specify things like what to name STDOUT files, whether to notify you by email when your job finishes, etc.
  • The mpirun.lsf command is used to start execution of your code on Odyssey’s compute nodes.

Explanations of the most important BSUB parameters specific to MPI jobs are given below:

  • -n 16Number of MPI processes in the parallel job. This should be equal to number of compute cores unless also using threads (e.g., MPI+OpenMP).
  • -q short_parallelThis parameter specifies the LSF queue to which your parallel job is submitted. You can choose from a number of batch queues for your job. The main purpose of having different queues is to control scheduling priorities and set limits on the numbers of jobs of different sizes. More information on the available queues on Odyssey can be found here.
  • -a openmpiThis indicates you are using the OpenMPI library. Currently the possible options are: openmpi or mvapich.
  • -R "span[ptile=8]"You have to use this option if you want to specify the number of MPI processes per compute node. In the above example the job would run on 2 compute nodes using 8 cores per node. This parameter is optional. It can have values between 1 and 8 — Odyssey has 8 cores per compute node.



This page was created by Plamen G. Krastev

Last modified on February 22, 2013 by plamenkrastev@fas.harvard.edu

Site last updated April 18, 2014