#

MATLAB Parallel – PCT and DCS

Introduction

This page is intended to help you with running parallel MATLAB codes on the Odyssey cluster. The latest software modules supporting parallel computing with MATLAB available on the cluster are:

matlab/R2016b-fasrc01
matlab/R2016a-fasrc01
matlab/R2015b-fasrc01
matlab/R2015a-fasrc01
matlab/R2014b-fasrc01
matlab/R2014a-fasrc01

Parallel processing with MATLAB is performed with the help of two products, Parallel Computing Toolbox (PCT) and Distributed Computing Server (DCS).

Parallel Computing Toolbox

Currently, PCT provides up to 32 workers (MATLAB computational engines) to execute applications locally on a multicore machine. This means that with the toolbox one could run parallel MATLAB codes locally on the compute nodes and use up to 32 cores. Most of the compute nodes on the cluster have 64 cores per node.

Parallel FOR loops (parfor)

Below is a simple code illustrating the use of PCT to calculate PI via a parallel Monte-Carlo method. This example also illustrates the use of parfor (parallel FOR) loops. In this scheme, suitable FOR loops could be simply replaced by parallel FOR loops without other changes to the code:

%============================================================================
% Parallel Monte Carlo calculation of PI
%============================================================================
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
R = 1;
darts = 1e7;
count = 0;
tic
parfor i = 1:darts
% Compute the X and Y coordinates of where the dart hit the...............
% square using Uniform distribution.......................................
x = R*rand(1);
y = R*rand(1);
if x^2 + y^2 <= R^2
% Increment the count of darts that fell inside of the.................
% circle...............................................................
count = count + 1; % Count is a reduction variable.
end
end
% Compute pi.................................................................
myPI = 4*count/darts;
T = toc;
fprintf('The computed value of pi is %8.7f.n',myPI);
fprintf('The parallel Monte-Carlo method is executed in %8.2f seconds.n', T);
delete(gcp);
exit;

Important: When using parpool in MATLAB, you need include the statement parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK'))) in your code. This statement tells MATLAB to start SLURM_CPUS_PER_TASK workers on the local machine (the compute node where your job lands). When the parallel computation is done, the MATLAB workers are released with the statement delete(gcp). If the above code is named, e.g., pfor.m, it can be sent to the queue with the below batch-job submission script. It starts a MATLAB parallel job with 8 workers:

#!/bin/bash
#SBATCH -J pfor
#SBATCH -o pfor.out
#SBATCH -e pfor.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-00:30
#SBATCH -p general
#SBATCH --mem=4000
 
source new-modules.sh
module load matlab/R2014b-fasrc01
srun -n 1 -c 8 matlab-default -nosplash -nodesktop -r "pfor"

The highlighted (in red) SBATCH directives reassure that there are 8 processing cores for the calculation, and they all reside on the same compute node. The number of cores you request must match the number of workers you spawn, otherwise you will negatively affect your job and all others running on that node. matlab-default must be called instead of the default matlab as only the former binary is allowed to spawn multiple processes.

If the submission script is named pfor.run, it is submitted to the queue by typing in

[pkrastev@sa01 test2]$ sbatch pfor.run
Submitted batch job 43510604

When the job has completed the pfor.out output file is generated.

MATLAB is selecting SOFTWARE OPENGL rendering.
 
< M A T L A B (R) >
Copyright 1984-2014 The MathWorks, Inc.
R2014b (8.4.0.150421) 64-bit (glnxa64)
September 15, 2014
 
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
 
ans =
 
Pool with properties:
 
Connected: true
NumWorkers: 8
Cluster: local
AttachedFiles: {}
IdleTimeout: 30 minute(s) (30 minutes remaining)
SpmdEnabled: true
 
The computed value of pi is 3.1409520.
The parallel Monte-Carlo method is executed in 20.30 seconds.
Parallel pool using the 'local' profile is shutting down.

Any runtime errors would go to the file pfor.err.

Single Program Multiple Data (SPMD)

In addition, MATLAB also provides a single program multiple data (SPMD) parallel programming model, which allows for a greater control over the parallelization -- tasks could be distributed and assigned to parallel processes ( labs or workers in MATLAB's terminology ) depending on their ranks. The below code provides a simple illustration -- it prints out the worker rank from each MATLAB lab:

%====================================================================
% Illustration of SPMD Parallel Programming model with MATLAB
%====================================================================
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
% Start of parallel region...........................................
spmd
nproc = numlabs; % get total number of workers
iproc = labindex; % get lab ID
if ( iproc == 1 )
fprintf ( 1, ' Running with %d labs.n', nproc );
end
for i = 1: nproc
if iproc == i
fprintf ( 1, ' Rank %d out of %d.n', iproc, nproc );
end
end
% End of parallel region.............................................
end
delete(gcp);
exit;

If the code is named spmd_test.m, it could be sent to the queue with this script

#!/bin/bash
#
#SBATCH -J spmd_test
#SBATCH -o spmd_test.out
#SBATCH -e spmd_test.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-00:30
#SBATCH -p general
#SBATCH --mem=4000
 
module load math/matlab-R2014b
srun -n 1 -c 8 matlab-default -nosplash -nodesktop -r "spmd_test"

If the batch-job submission script is named spmd_test.run, then it is sent to the queue with

[pkrastev@sa01 test2]$ sbatch spmd_test.run
Submitted batch job 43515333

The output is printed out to the file spmd_test.out:

MATLAB is selecting SOFTWARE OPENGL rendering.
 
< M A T L A B (R) >
Copyright 1984-2014 The MathWorks, Inc.
R2014b (8.4.0.150421) 64-bit (glnxa64)
September 15, 2014
 
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
 
ans =
 
Pool with properties:
 
Connected: true
NumWorkers: 8
Cluster: local
AttachedFiles: {}
IdleTimeout: 30 minute(s) (30 minutes remaining)
SpmdEnabled: true
 
Lab 1:
Running with 8 labs.
Rank 1 out of 8.
Lab 2:
Rank 2 out of 8.
Lab 3:
Rank 3 out of 8.
Lab 4:
Rank 4 out of 8.
Lab 5:
Rank 5 out of 8.
Lab 6:
Rank 6 out of 8.
Lab 7:
Rank 7 out of 8.
Lab 8:
Rank 8 out of 8.
Parallel pool using the 'local' profile is shutting down.

Distributed Computing Server

The DCS allows for a larger number of MATLAB workers to be used on a single node and/or across several compute nodes. The current DCS license we have on the cluster allows for using up to 256 MATLAB workers. DCS is integrated with SLURM and works with MATLAB versions R2014a, R2014b, R2015a, R2015b, R2016a and R2016b, available with legacy software modules math/matlab-R2014a, math/matlab-R2014b, math/matlab-R2015a, math/matlab-R2015b, math/matlab-R2016a and math/matlab-R2016b, and LMOD software modules matlab/R2014a-fasrc01, matlab/R2014b-fasrc01, matlab/R2015a-fasrc01, matlab/R2015b-fasrc01, matlab/R2016a-fasrc01 and matlab/R2016b-fasrc01. The below example steps describe how to set up and use DCS on the cluster:

(1) Log on to the cluster via our NoMachineX (instructions here) and start the MATLAB's GUI. (You can use your own X11 client with X11-forwarding enabled, but the performance could be sluggish.)

[pkrastev@sa01 test2]$ matlab

(2) Configure DCS to run parallel jobs on Odyssey by calling configCluster. This needs to be called only once for each MATLAB version.

(3) Setup job parameters, e.g., Wall Time, queue / partition, Memory-Per-CPU, etc. The below example illustrates how this can be done interactively. Once these parameters are set up, their values become default unless changed.

>> ClusterInfo.setWallTime('01:00:00')
>> ClusterInfo.setQueueName('serial_requeue')
>> ClusterInfo.setMemUsage('4000')

(4) Display parallel cluster configuration with ClusterInfo.state.

NOTE: This lists the available cluster options and their current values. These options could be set up as desired.

>> ClusterInfo.state
Arch :
ClusterHost :
DataParallelism :
DiskSpace :
EmailAddress :
GpusPerNode :
MemUsage : 4000
PrivateKeyFile :
PrivateKeyFileHasPassPhrase : 1
ProcsPerNode :
ProjectName :
QueueName : serial_requeue
RequireExclusiveNode : 0
Reservation :
SshPort :
UseGpu : 0
UserDefinedOptions :
UserNameOnCluster :
WallTime : 01:00:00

(5) Submit parallel DCS jobs. There are two ways to submit parallel DCS jobs - from within MATLAB, and directly through SLURM.

Submitting DCS jobs from within MATLAB

We will illustrate submitting DCS jobs from within MATLAB with a specific example. Below is a simple function evaluating the integer sum from 1 through N in parallel:

%==========================================================
% Function: parallel_sum( N )
% Calculates integer sum from 1 to N in parallel
%==========================================================
function s = parallel_sum(N)
s = 0;
parfor i = 1:N
s = s + i;
end
fprintf('Sum of numbers from 1 to %d is %d.n', N, s);
end

Use the batch command to submit parallel jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted jobs. See the example below and refer to the official MATLAB documentation for more help on batch. This assumes that the MATLAB function is named parallel_sum.m. Note that these jobs will always request n+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs 8 workers will consume 9 CPU cores.

% Define a cluster object
>> c = parcluster;
% Define a job object using batch
>> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);

Once the job completes, we can retrieve the job results. This is done by calling the function fetchOutputs. Then we also need to delete the job object.

NOTE: fetchOutputs is used to retrieve function output arguments. Data that has been written to files on the cluster needs to be retrieved directly from the filesystem.

% Fetch job results
>> j.fetchOutputs{:};
% Delete job object
>> j.delete;

Submitting DCS jobs directly through SLURM

Parallel DCS jobs could be submitted directly from the Unix command line through SLURM. For this, in addition to the MATLAB source, one needs to prepare a MATLAB submission script with the job specifications. An example is shown below:

%==========================================================
% MATLAB job submission script: parallel_batch.m
%==========================================================
c = parcluster;
ClusterInfo.setWallTime('01:00:00');
ClusterInfo.setQueueName('general');
ClusterInfo.setMemUsage('4000');
j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);
exit;

If this is script is named, for instance, parallel_batch.m, it is submitted to the queue with the help of the following SLURM batch-job submission script:

#!/bin/bash
#
#SBATCH -J parallel_sum_DCS
#SBATCH -o parallel_sum_DCS.out
#SBATCH -e parallel_sum_DCS.err
#SBATCH -p serial_requeue
#SBATCH -n 1
#SBATCH -t 0-00:20
#SBATCH --mem=2000
 
matlab-default -nosplash -nodesktop -r "parallel_batch"

Assuming the above script is named parallel_sum_DCS.run, for instance, the job is submitted as usual with

sbatch parallel_sum_DCS.run

NOTE: This scheme dispatches 2 jobs - one serial that spawns the actual DCS parallel jobs, and another, the actual parallel job.

Once submitted, the DCS parallel job can be monitored and managed directly through SLURM.

After the job completes, one can fetch results and delete job object from within MATLAB. If program writes directly to disk fetching is not necessary.

>> j.fetchOutputs{:};
>> j.delete;

References

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.