#

Available Memory on Odyssey

This page is intended to help you plan your memory usage on the Odyssey cluster. It also provides instructions for requesting memory resources for your jobs.

Memory on Odyssey

Most of the compute nodes, specifically the hero nodes, have 32 GB of RAM. The hero nodes have 8 cores ( i.e., 4 GB per core ). In addition, there is a dozen of machines ( camd04, camd05, camd06, camd07, camd10, camd09, camd11, camd12, camd13, camd14 ) with 193 GB of RAM and 48 cores. Please, follow the below steps for determining the memory of specific machines available to you.

(1) List available queues:

[username@iliadaccess03 ~]$ bqueues -u username
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
interact 141 Open:Active 96 4 - - 33 0 33 0
short_parallel 30 Open:Active - 512 - - 528 304 224 0
normal_parallel 25 Open:Active - 512 - - 573 420 145 0
long_parallel 20 Open:Active - 512 - - 6984 6336 648 0
long_serial 15 Open:Active - 256 - - 249 193 55 0
short_serial 10 Open:Active - - - - 5422 5315 90 17
normal_serial 5 Open:Active - - - - 4919 1968 2881 57
unrestricted_pa 3 Open:Active - 256 - - 297 184 65 0
unrestricted_se 1 Open:Active - 256 - - 545 481 64 0

Note: Depending on the specific research group you belong to, you could also have an access to additional queues ( and machines ) to those listed above.

(2) List the hosts making particular queue:

[username@iliadaccess03 ~]$ bqueues -l normal_serial

QUEUE: normal_serial
-- For normal serial jobs.

PARAMETERS/STATISTICS
PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV
5 0 Open:Active - - - - 4906 1949 2902 55 0 0

RUNLIMIT
1440.0 min of iliad44

PROCLIMIT
8

SCHEDULING PARAMETERS
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -

local_scratch
loadSched -
loadStop -

SCHEDULING POLICIES: FAIRSHARE EXCLUSIVE
FAIRSHARE_QUEUES: long_serial short_serial normal_serial amelio
USER_SHARES: [allusers, 1]

SHARE_INFO_FOR: normal_serial/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME ADJUST
allusers 1 0.000 3100 1 57760304.0 103422536 0.000

USERS: all ~ncf_users/
HOSTS: edwards01 davis/ nunn_tmp/ airoldi/ shock/ pleiades/ galaxy/ giribet/ giribet-new/ karplus/ wakeley/ betley/ ip2/ west/ soph-compute/ chetty_temp/
gpgpu/ camd04 camd05 camd06 camd07 mitrovica/ iliad211 iliad212 gc/ nehalem/ enj/ wofsy/ iliad/ tuna/ moorcroft_6100_blade_a/ moorcroft_6100_blade_b/ huybers/
eldorado_scavenge/ moorcroft2b/ moorcroft2a/ moorcroft2c/ turnbaugh/

POST_EXEC: /odyssey/apps/local/bin/scratch_scrub.sh
RES_REQ: select[type==any] Maximum slot reservation time: 14400 seconds

Hosts with "/" at the end are host groups and these can be expanded by, e.g.:

[pkrastev@iliadaccess04 ~]$ bmgroup west
GROUP_NAME HOSTS
west west0282 west0173 west0283 west0174 west0284 west0201 west0311 west0202 west0312 west0203 west0313 west0204 west0314 west0141 west0251 west0142 west0361 west0252 west0143
west0362 west0253 west0144 west0363 west0254 west0364 west0221 west0331 west0222 west0332 west0223 west0333 west0224 west0334 west0161 west0271 west0162 west0381 west0272 west0163
west0382 west0273 west0164 west0383 west0274 west0384 west0301 west0302 west0303 west0304 west0132 west0241 west0351 west0242 west0134 west0352 west0243 west0353 west0244 west0354
west0211 west0321 west0212 west0322 west0213 west0323 west0214 west0324 west0151 west0261 west0152 west0371 west0262 west0153 west0372 west0263 west0154 west0373 west0264 west0374
west0231 west0341 west0232 west0342 west0233 west0343 west0234 west0344 west0171 west0281 west0172
(3) Display memory (and other attributes) of specific hosts:
<code[username@iliadaccess03 ~]$ lshosts west0371
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
west0371 X86_64 Intel_EM 60.0 12 23970M 8189M Yes (intel)

[username@iliadaccess03 ~]$ lshosts camd07
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
camd07 X86_64 Opteron8 60.0 48 193318M 8189M Yes (amd)

The memory is reported in MB. For instance, in the above examples west0371 has 24 GB of RAM, while camd07 has 193 GB of RAM. Typing lshosts alone generates a long list of all machines making the Odyssey cluster.

[username@iliadaccess03 ~]$ lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
iliadserv1 X86_64 Intel_EM 60.0 8 7985M 20481M Yes (mg)
awe01 X86_64 Intel_EM 60.0 32 128975M 4000M Yes ()
heroint1 X86_64 Intel_EM 60.0 8 15919M 8189M Yes ()
heroint2 X86_64 Intel_EM 60.0 8 63576M 8189M Yes ()
heroint3 X86_64 Intel_EM 60.0 8 63064M 8189M Yes ()
heroint4 X86_64 Intel_EM 60.0 12 96541M 8189M Yes ()
heroint5 X86_64 Intel_EM 60.0 12 96541M 8189M Yes ()
heroatlas X86_64 Intel_EM 60.0 8 16046M 8189M Yes ()
herophysics X86_64 Intel_EM 60.0 16 16044M 8189M Yes ()
hero0101 X86_64 Intel_EM 60.0 8 32058M 8189M Yes (ib intel odyssey)
hero0102 X86_64 Intel_EM 60.0 8 32058M 8189M Yes (ib intel odyssey)
hero0103 X86_64 Intel_EM 60.0 8 32058M 8189M Yes (ib intel odyssey)
... ... ... ... ... ... ... ... ... ... ...

Requesting Memory Resources on Odyssey

If your application requires specific amount of memory, you could request memory resources by specifying the -R option in your batch job submission script as illustrated in the example below. You may also specify the -x option, which is used to reserve the node exclusively for the job and does not allow any other jobs to run on the node until the specific job completes.
#!/bin/sh

BSUB -J myjob

BSUB -o myjob.out

BSUB -e myjob.err

BSUB -q normal_serial

BSUB -R "rusage[mem=36000]"

BSUB -x

./myprogram.x
For instance, this script will reserve a node with at least 36 GB of RAM available for the job on the normal_serial queue. Keep in mind, however, that jobs submitted with the "exclusive use" ( -x ) option will, in general, take longer to dispatch. This is because in order to start the job will wait until all previously submitted jobs clear off the node.

Alternatively, you may reserve a node for an "exclusive use" by specifying the total number of cores on the node with the -n option in your job submission script. If you have an access to higher priority queues, your jobs may dispatch faster. ( Other running jobs will become suspended giving priority to your job. ) However, if there are other jobs running on the node, you may not be able to use the entire memory of the node -- although suspended, stopped jobs continue to use memory, and you job could crush. Here is an example script:
#!/bin/sh

BSUB -J myjob

BSUB -o myjob.out

BSUB -e myjob.err

BSUB -q normal_serial

BSUB -n 48

BSUB -R "span[ptile=48]"

./myprogram.x
This script will reserve one of the camd nodes available on the normal_serial queue.

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.